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In this thesis we consider the problem of information hiding in the 
scenarios of interactive systems, statistical disclosure control, and refine- 
ment of specifications. We apply quantitative approaches to information 
flow in the first two cases, and we propose improvements for the usual 
solutions based on process equivalences for the third case. 

In the first scenario we consider the problem of defining the infor- 
mation leakage in interactive systems where secrets and observables can 
alternate during the computation and influence each other. We show 
that the information-theoretic approach which interprets such systems 
as (simple) noisy channels is not valid. The principle can be recovered, 
however, if we consider channels of a more complicated kind, that in 
information theory are known as channels with memory and feedback. 
We show that there is a complete correspondence between interactive 
systems and these channels, and we propose the use of directed informa- 
tion from input to output as the real measure of leakage in interactive 
systems. We also show that our model is a proper extension of the clas- 
sical one, i.e. in the absence of interactivity the model of channels with 
memory and feedback collapses into the model of memoryless channels 
without feedback. 

In the second scenario we consider the problem of statistical disclo- 
sure control, which concerns how to reveal accurate statistics about a 
set of respondents while preserving the privacy of individuals. We focus 
on the concept of differential privacy, a notion that has become very 
popular in the database community. Roughly, the idea is that a ran- 
domized query mechanism provides sufficient privacy protection if the 
ratio between the probabilities that two adjacent datasets give a certain 
answer is bound by a constant. We observe the similarity of this goal 
with the main concern in the field of information flow, namely limiting 
the possibility of inferring the secret information from the observables. 
We show how to model the query system in terms of an information- 
theoretic channel, and we compare the notion of differential privacy with 
that of min-entropy leakage. We show that differential privacy implies a 
bound on the min-entropy leakage, and we also consider the utility of the 
randomization mechanism, which represents how close the randomized 
answers are, in average, to the real ones. Finally we show that the notion 
of differential privacy implies a tight bound on utility, and we propose a 
method that under certain conditions builds an optimal randomization 
mechanism. 

Moving the focus away from quantitative approaches, in the third 
scenario we address the problem of using process equivalences to charac- 
terize information-hiding properties (for instance secrecy, anonymity and 
non-interference). In the literature, some works have used this approach, 
based on the principle that a protocol P with a variable x satisfies such 
property if and only if, for every pair of secrets si and S2, P['^^/x] is 
equivalent to P[''^/x]- We show that, in the presence of nondetermin- 
ism, the above principle may rely on the assumption that the scheduler 
"works for the benefit of the protocol", and this is usually not a safe as- 
sumption. Non-safe equivalences, in this sense, include complete-trace 



equivalence and bisiniulation. This problem arises naturally when re- 
fining a specification into an implementation, since usually the former is 
more abstract than the latter, and the refinement process involves reduc- 
ing the nondeterminism. The scheduler is, in this sense, a final product 
of the refinement process, after all the nondeterminism is ruled out. We 
present a formalism in which we can specify admissible schedulers and, 
correspondingly, safe versions of complete-trace equivalence and bisiniu- 
lation. We prove that safe bisimulation is still a congruence. Finally, we 
show that safe equivalences can be used to establish information-hiding 
properties. 



One 



Introduction 



'There are two mistakes one can make along the road to truth: 
not going all the way, and not starting. " 

Gautama Siddharta 



1.1 Information hiding 

In the last few decades the amount of information flowing through computa- 
tional systems has increased dramatically. Never before in history has a soci- 
ety been so dependent on such a huge amount of information being generated, 
transmitted and processed. It is expected that this solid trend of increase will 
continue in the near future, if not virtually indefinitely, reinforcing the need 
for efficient and safe ways to cope with this reality. 

Although the efficient and broad dissemination of information is a goal 
in many situations, there are instances where the disclosure of information is 
undesirable or even unacceptable. The field of information hiding concerns the 
problem of guaranteeing that part of the information relative to an event is kept 
secret. In computer science, the term information hiding encompasses a large 
spectrum of fields. Different fields have distinct historical motivations and the 
resulting research followed a unique path. The variation of the subfields of 
information hiding depends on three main factors: (i) what one wants to keep 
secret; (ii) from which adversary or attacker does one want to keep it secret; 
and (iii) how powerful the adversary or attacker is. 

The field of confidentiality (or secrecy) refers to the problem of keeping 
an action secret. One application of confidentiality is cryptographic proto- 
cols, where the sender and the receiver of a message can be known, but the 
contents of the message itself are considered to be sensitive information. Gen- 
erally, we can say that confidentiality concerns data, while the field of privacy 
concerns people's personal information. When dealing with privacy, we may 
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be interested in protecting the information about someone (a credit card num- 
ber, for instance) or the person's identity itself. Anonymity is the field that 
concerns the protection of the identities of agents involved in events. In prin- 
ciple, anonymity can be related to both the active agent (often the sender 
of a message), or to the passive agent (often the receiver of a message). For 
instance, in the case of a journalist receiving information from a confidential 
source, the identity of the sender is intended to be secret. As for the case of 
an intelligence agency sending a coded message to a spy, the identity of the 
receiver is confidential information. There is yet another kind of anonymity, 
sometimes referred to as unlinkability, where the identity of agents and actions 
performed are public information, but the linkage between agents and the ac- 
tions performed should not be determined. One example of unlinkability is a 
confidential voting system, where both the voters and the final vote count are 
in the public domain, but the relationship between the voters' identities and 
the ballots cast is protected. 

One application of privacy that has drawn a lot of attention in recent years 
is the problem of statistical databases. A statistic is a quantity computed from 
a sample, and the goal of statistical disclosure control is to enable the user of the 
database to learn properties of the population as a whole, while maintaining 
the privacy of individuals in the sample. The field of statistical databases 
highlights the delicate equilibrium between the benefits and the drawbacks of 
the spread of information. A practical example occurs in medical research, 
where it is desirable that a great number of individuals agree to give their 
personal medical information. With the information acquired, researchers or 
public authorities can calculate a series of statistics from the sample (such as 
the average age of people with a particular condition) and decide, say, how 
much money the health care system should spend next year in the treatment 
of a specific disease. It is in the interest of each individual, however, that her 
participation in the sample will not harm her privacy. In our example, the 
individuals usually do not want to have disclosed their specific status with 
relation to a given disease, not even to the users querying the database. Some 
studies, e.g. [JoiOl], suggest that when individuals are guaranteed anonymity 
and privacy they tend to be more cooperative in giving personal information. 

Another important field of information hiding is information flow, which 
concerns the leakage of classified information via public outputs in programs 
and systems. Consider a system that asks the users a password to grant their 
access to some functionality. Naturally, the password itself is intended to be 
secret, however an attacker trying to guess it will always get an observable 
reaction from the system, whether the response is an acceptance or a rejection 
of the entered code. In either case, the observable behavior of the system 
reveals some information about the password, because even if it is not guessed 
correctly, at least the search space is narrowed (even if, in this case, only 
slightly) . 

It is important to note that the subdivisions of information hiding are not 
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mutually exclusive. In a system where public outputs can reveal the identity of 
agents, for instance, both the problems of information flow and of anonymity 
are present. The classification is usually based more on the contextual mo- 
tivation for the problem than on a rigid taxonomy of subfields. In fact, in 
recent years there has been an active line of research exploring the similarities 
between problems such as the foundations of anonymity and information flow, 
and also privacy and information flow. The result has been an increasing con- 
vergence between these fields. In this thesis we explore the similarities between 
information flow, statistical databases, and anonymity. 

In a broader context, the importance of information hiding goes far beyond 
the realm of computer science, and there are a lot of subtle questions that need 
to be considered carefully. From a political and even philosophical perspective, 
the unrestricted use of privacy protection can be controversial. Even though 
it is broadly accepted that people should have the right to exchange e-mails 
privately, to vote in democratic elections anonymously, and to express their 
ideas on the Internet freely, there are situations where information protection 
policies can be argued to have serious drawbacks. The same mechanism that 
grants a political activist anonymity and free speech on the Internet, while 
living under a repressive government, also grants a pedophile anonymity to 
broadcast harmful material. This balance between freedom and control in the 
virtual media has been the subject of passionate discussion. Independently of 
whether one's goal is to maximize or to minimize the degree of information 
protection in a given situation, it is anyway desirable to measure the extent 
to which the information is protected, to define which specific definition of 
protection the information falls under, and from whom the information is pro- 
tected. 

In this thesis we avoid the controversy of deciding in which cases the appli- 
cation and extent of information hiding methods are justifiable. Rather, our 
focus is on measuring the degree of information protection offered by a system, 
thus making evaluation and comparison of different systems possible . Specifi- 
cally, we are interested in using concepts of information theory to quantify the 
leakage of information. 



1.2 Qualitative and quantitative approaches to 
information hiding: a brief history 

Historically, the research on information hiding has evolved from the simple 
but imprecise qualitative approach toward the more refined, but at the same 
time more complex, quantitative approach. In the following sections we will 
briefly overview both. We do not intend to provide here an exhaustive study of 
the subject, but rather to highlight some of the most important contributions 
of each of these lines of research to the field of information hiding. 
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1.2.1 The qualitative approach 

The qualitative approach emerged first in the Hterature of information hiding. 
The central idea is that, by observing the output of a system, the adversary 
cannot be completely sure of what the secret information is. The principle of 
confusion says that for every observable output generated by a secret input, 
there is another secret that could also have generated the same output. In 
anonymity, for instance, this corresponds to the concept of possible innocence, 
i.e. the impossibility of identifying the culprit with certainty by only observing 
the system's output. The principle of confusion does not take into considera- 
tion the adversary's certainty about the value of the secret: it is enough that 
there be an alternative hypothesis, no matter how unlikely it is. This is also 
known as the possihilistic approach. 

One of the first developments in this field dates from 1976, when Bell and 
La Padula defined the model of multilevel security systems [BLP76]. In this 
model the components of a system are classified as either subjects, i.e. active 
entities such as users or processes, or as objects, i.e. passive entities such as files. 
The subjects are divided into trusted and untrusted entities, and the authors 
define restrictions on how to manage untrusted objects. The rule "no read up 
or write down" states that untrusted entities can read only from objects of the 
same or lower levels, and that they can only write into objects of the same or 
higher levels. This model was developed to support different levels of security, 
and aimed to ensure that information only flows from lower to higher levels and 
never in the opposite direction. Each input into and output from the system 
is labeled with a security level. Any pair of an input and its corresponding 
output is called an event. A view of a security level / corresponds to the events 
at level / or lower, and all the events of a higher level are hidden to level /. 

Usually in this model only two levels are distinguished: high and low. 
The high level corresponds to sensitive information, which should only be 
available to some users with special privileges, while the low level corresponds 
to public information accessible to everyone. The goal of secure information 
flow analysis is, in this context, to avoid leakage from the high level to the low 
level. 

Bell and La Padula's model, however, did not address the problem of leak- 
age of information due to covert channels. A covert channel is a way of trans- 
mitting information from the high to the low environment by means not de- 
signed or intended for this purpose. Consider, for instance, a system where a 
low user £ can send a file to a high user h, and h has the power to redefine the 
access rights to the file. The user h can either maintain the permission of d to 
write in the file, or she can change the policy so i no longer has access to it. In 
this scenario, a covert channel between a corrupted high user h and low user £ 
can be established as follows. The low user sends a file to the high user, who 
then uses her power of deciding whether to grant or to deny i further access 
to it to encode a message. In a later stage, i tries to write in the file, and an 
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access failure can be interpreted as the bit 0, while a success can be interpreted 
as the bit 1. In this way any message can eventually be sent through the covert 
channel from the corrupted high user to the low one. 

To cope with the threat of covert channels, Goguen and Meseguer devel- 
oped the concept of noninterference[GM82]. A system is noninterfering when 
the actions of high users do not alter what can be seen by low users. In other 
words, the low outputs of the system will only reflect the values of the low 
inputs, independently of what the high inputs are (if any). The authors pro- 
posed a model of noninterference that separated the system from the security 
policies. Their model, nevertheless, was only appropriate for deterministic 
systems. 

Noninterference, however, may be a too restrictive concept for several prac- 
tical applications. It does not allow, for instance, the summarization of data. 
It is often the case where a system allows statistical (or summarizing) func- 
tions (e.g. mean, total number) to be calculated on its high inputs and then 
disclosed to low users, even if the high inputs themselves are supposed to be 
kept secret. These systems are typical in the area of statistical databases, and 
we will discuss this issue in more detail in Section 1.3.2. Clearly, a system 
that allows the summarization of high data for the low environment violates 
noninterference, since a change on the high input may affect the low output. 

Considering this problem, in 1986 Sutherland [D.S86] proposed the con- 
cept of nondeducihility on inputs^ which focuses not on whether the output 
is affected according to a change in the input, but on whether it is possible 
to deduce the input from the output. Under this definition, a system may 
allow summarization of data and still be secure, since the output of a sta- 
tistical function does not necessarily allow the adversary to deduce what the 
inputs are. One drawback of the concept of nondeducihility on inputs is that 
it assumes that the strongest form of the principle of confusion is enough to 
ensure security. Notably, it relies on the assumption that "no high value can 
be ruled out after observing a low value". This is not a strong enough security 
guarantee in many real systems. In some cases, even if no high value can be 
ruled out as a possibility, a single value (or a small set of values) can be much 
more likely than the others, and in practice it makes little sense to consider 
the alternatives. This criticism can be seen as an early attempt to consider a 
quantitative approach for information flow, where it is taken into consideration 
"how much" an attacker learns (or does not learn) about the secret matters. 

Another important issue in security systems is the problem of composi- 
tionality. In [McC87], McCullough pointed out the importance of hook-up 
security, i.e. the compositionality of multi-user systems. Usually, real systems 
are far too complex to be analyzed as a whole, especially because the task 
of designing and implementing a system is normally divided between teams. 
Each team is responsible for a number of components that, in a later stage, 
will be put to work together. It is highly desirable that security properties 
be verified in each component separately, and that this verification guarantee 
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that the final composite system is also secure. McCullough showed that the 
concepts of multilevel security systems, noninterference, and nondeducibility 
on inputs are not composable. As a replacement, he proposed the concept of 
restrictiveness, according to which no high level information should affect the 
behavior of the system, as seen by a low user. 

In [WJ90] Wittbold and Johnson addressed the question of nondeducibility 
on inputs under a different perspective, showing that it is not a guarantee of 
absence of leakage. Consider the following algorithm, where H and L stand for 
the high and the low environments, respectively. Here we assume the variables 
X and y are binary, and the randomized command x ©o.5 1 assigns to x 
either the value or the value 1 with 0.5 probability each. 

while true do 

x^o 00.5 1; 

output X to H; 
input y from H; 
output (x XOR y) to L; 
end while 

In the above algorithm, the low environment only has access to the value (x 
XOR y). Note, however, that the high environment H learns the value of x 
before having to choose the value of y, and therefore it can use this knowledge 
to encode a message: To transmit the bit 0, H chooses y = x, and to transmit 
the bit 1, H chooses y = 1 — x. It is clear that there is some flow of information 
from the high to the low environment, even though L cannot deduce the high 
input y from the low output (x XOR y). Hence, satisfying nondeducibility 
on inputs does not guarantee a system to be secure. Wittbold and Johnson 
defined, then, the concept of nondeducibility on strategies, which means that 
regardless of what view L has of the machine, no strategy is excluded from 
being used by H. 

1.2.2 The quantitative approach 

The qualitative approach, although simple and easy to apply, does not reflect 
reality in many practical situations. In many cases some information leakage 
is tolerable or even intentional. Take an election protocol. After the final vote 
count is released, there are fewer possible hypotheses concerning who voted for 
whom than the hypotheses available before the votes were cast. In this exam- 
ple there is a natural leakage of information, since the uncertainty about the 
sensitive information decreases after the observation of the protocol's output. 
This leakage occurs, however, as a necessary functionality of the protocol. 

In fact, in most real systems noninterference cannot be achieved, as typical 
systems will always leak some information. This does not mean, however, that 
all systems are equally good or bad, because the amount of leakage usually 
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varies from system to system. Therefore it is important to quantify how much 
leakage a system allows. Quantitative methods are useful to evaluate the extent 
to which a system is secure, and to compare it to other systems. 

One of the first attempts to quantify information leakage was made by 
Denning in 1982. In [Den82] she defined the leakage from a state s to a state 
s' as the decrease in uncertainty about the high information in s resulting 
from the low information in s' . She used the concept of conditional entropy^ 
H{hs\(.s')^ where kg is the high information in s and ^s' is the low information 
in s' . Her definition of leakage was: 

Mi = H{h,\Q-H{hs\^s') 

If the quantity Mi is positive, then it is considered to be the leakage of in- 
formation. This measure of leakage, however, does not consider the history of 
low inputs, a problem pointed out by Clark, Hunt and Malacaria in [CHM07]. 
Without the history one cannot summate the increase in knowledge (or de- 
crease in uncertainty) that accumulates between the low states s and s' . They 
proposed, instead, the following measure of leakage: 

M2 = H{hs%)-H{hs%',Q 

Since H{X\Y, Z) < H{X\Y) for all random variables X, Y and Z, we have 
Ml < M2. The quantity M2 corresponds to the Shannon conditional mutual 
information I{hs]^s'\^s)■ 

In 1987, Millen made a formal connection between information flow and 
Shannon information theory by relating noninterference and mutual informa- 
tion [Mil87]. In Milieu's model, a computer system is seen as a channel whose 
input is a sequence VF, possibly generated by a set of users, and whose output 
(after the computation is completed) is Y . The random variable X represents 
a subsequence of W generated by a user [/, while X represents the high inputs 
generated by users other than U . Millen showed that in deterministic systems 
if X and X are independent and X is not interfering with 1", then the Shan- 
non mutual information I{X]Y) between X and Y is zero. In other words, 
noninterference is a sufficient condition for absence of information flow. 

In 1990, Massey gave an important contribution to the field of information 
theory, which influenced the further development of quantitative information 
flow. In [Mas90] he showed that the usual definition of discrete memoryless 
(i.e. history-independent) channels used at that time in fact did not take into 
account the possibility for the use of feedback. He highlighted the conceptual 

^The concepts of entropy, conditional entropy and mutual information will be defined 
formally in Chapter 3. For the moment it is enough to know that entropy is a measure of 
the uncertainty of a random variable; conditional entropy is a measure of the uncertainty of 
one random variable given another random variable; and mutual information is a measure 
of how much information two random variables share. 
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difference between causality and statistical dependence, and presented an ac- 
curate mathematical description of discrete memoryless channels that allowed 
feedback. Then he introduced the concept of directed information, which cap- 
tures the idea of causality between the input and the output of a channel, and 
argued that in the presence of feedback, directed information is a more appro- 
priate measure of the flow of information from input to output than mutual 
information. 

In the same year, McLean also considered the concept of time in the de- 
scription of systems by proposing his Flow Model [McL90]. According to this 
model, there is a flow of information only when a high user H assigns values 
to objects in a state that precedes the state in which a low user L makes her 
assignment. In this situation only part of the correlation between high and low 
information is considered as leakage. This addressed the problem of causality, 
but this model was too general, and relatively difficult to apply. 

In [Gra91] Gray worked on bridging the gap between the overly compli- 
cated Flow Model and the more practical, yet restricted, approach of Millen. 
Gray used a general-purpose probabilistic (as opposed to nondeterministic) 
state machine that resembled Milieu's model. In Gray's model, the value 
T(s, /, s' , O) represents the probability of a given state s evolving into another 
state s', under the input /, and producing output O. The channels are par- 
titioned into two sets, H and L, representing the channels connected to high 
and low processes, respectively. The high and the low environments can com- 
municate only through their interactions with the system, as no other form 
of communication between them is allowed. Gray wanted to take time and 
causality into consideration in his definition of leakage, and he did so by allow- 
ing feedback and memory in his model. His formulation of a security guarantee 
was the following: 

P(L^ nL^CiH^ n > =^ 

ill) 

P{i\L^ n n n H^) = P{i\L^ n L^) 

where and L*^ represent the history of low inputs and outputs, respectively, 
and and H'^ represent the history of high inputs and outputs, respectively. 
The symbol £ represents the final output event channels in the low environment. 
The formulation (1.1) states that the probability of a low output may depend 
on the previous history of the low environment, but not on the previous history 
of the high environment. 

Gray also tried to generalize the concept of capacity to the case of channels 
with memory and feedback. He provided a formula expressing the flow of 
information from the whole history of inputs and outputs (during the time 
period ... t — 1) to the the low output (at time t), and conjectured that the 
capacity of the channel would be: 

C = lim Cn (1.2) 

n— >oo 
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where 

d f 1 " 

Cn = max — I{In_Seq_Eventjf f, Out _Seq _Event ^ ^\ 

^'^ " i=i (1-3) 

Final _ Out _Event^ ^\In _Seq _Event ^, Out _Seq _Event ^ ^) 

and In_Seq_Eventj^^ is the input history at channel A (where A stands for 
L or H) up to time t — 1, Out_Seq_Event^ ^ is the output history at channel 
A up to time and Final _ Out _Eventj^ ^ is the low output event at time 

t. Gray showed that the absence of information flow implies that capacity as 
formulated in (1.2) is zero. He also conjectured that this definition of capacity 
would correspond to the notion of maximum transmission rate supported by 
the channel. As pointed out in [AAPll], however, the problem with Gray's 
conjecture is the following. For an output at time t, the only causal relation 
considered is the one with the history of inputs up to time t — 1, while the effect 
that the input at time t itself may have on the output is ignored. In this way, 
(1.2) does not express the complete causal relation between input and output. 
The correct notion of capacity in the presence of memory and feedback, which 
corresponds to the maximum transmission rate for the channel, was proposed 
in 2009 by Tatikonda and Mitter [TM09], and it will be discussed later on in 
Chapter 4. 

A similar formal approach, although with different motivations, was pre- 
sented by Mclver and Morgan in [MM03]. They focused on the problem of 
preserving security guarantees while refining specifications into implementa- 
tions. The authors used an equation similar to (1.3), but in the context of 
sequential programing languages enriched with probabilities. Their aim was 
to protect the high values during the whole execution of the program, instead 
of the initial high values only. In other words, they wanted to assure that if the 
high information is not known by the low environment at the beginning of the 
computation, then it cannot be inferred at any later stage. They proved that, 
for deterministic programs, if the final values of the high objects are protected, 
then the initial values are protected as well. Mclver and Morgan also defined 
the concept of information escape as: 

H{h\l) - H{h'\l') 

where H{h\i) represents the uncertainty (conditional entropy) of the high in- 
formation given the low information at the beginning of the computation, and 
H{h'\i') represents the same uncertainty at the end of the computation. They 
defined the channel capacity as the least upper bound of information escape 
over all possible input distributions. In this context, a system is considered 
secure if it has capacity equal to zero. One advantage of this model is that it 
is not necessary to keep track of the whole history of the computation, but on 
the other hand it can be applied only in scenarios where the adversary does 
not have memory. 



9 



1. Introduction 



In Chapter 3 we will take up again the discussion of quantitative approaches 
to information flow based on information theory. For the moment we will focus 
on some topics related to information hiding that are of special relevance for 
this thesis. 

1.3 Case studies of information hiding 

In this section we present three case studies of information hiding that we 
address in this thesis. 

1. The case of quantitative information flow, i.e. how much about the secret 
information an adversary can learn by observing the system's output, 
and by knowing how the system works. We give special attention to the 
broadly studied problem of anonymity, which can be seen as a particular 
case of the more general problem of information flow where the secret 
information is the identity of the agents. 

2. The question of statistical disclosure control, which concerns the problem 
of allowing users of a database to obtain meaningful answers to statisti- 
cal queries, while protecting the privacy of the individuals participating 
in the database. We focus on differential privacy, an approach to this 
problem that has drawn a lot of attention in recent years. 

3. The problem of preserving security guarantees while deriving implemen- 
tations from specifications. Usually specifications are more abstract than 
implementations, i.e. they present more nondeterminism. The task of 
implementing a system reduces the nondeterminism of the specification, 
and if it is not done carefully, an implementation may rule out possibili- 
ties allowed by specification that are essential for the security guarantees. 

1.3.1 Quantitative information flow and anonymity 

Anonymity is one of the most studied subjects of information hiding. The 
research in this area has been active in the past several years, and the advances 
made can be extended to the more general scenario of information flow. As 
briefly introduced in Section 1.1, anonymity concerns the protection of the 
identities of the agents involved in the events. 

With the advent of the Internet, the protection of anonymity has become an 
issue in the daily life of millions of people around the world. The importance 
of anonymity is even more evident concerning the protection of freedom of 
speech, a situation that is particularly delicate in countries under repressive 
regimes. 

Pfitzmann, Dresden and Hansen [PDH08] have proposed a standard termi- 
nology for anonymity concepts. In their work there are three different notions 
of anonymity based on the agents involved: 
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• Sender anonymity: when the identity of the originator should be pro- 
tected; 

• Receiver anonymity: when the identity of the recipient should be pro- 
tected; 

• Unlinkability : when it might be known that an agent A originated a 
message and an agent B received a message, yet it should not be known 
whether the message sent by A was actually the one received by B. 

Reiter and Rubin also gave a classification of the types of adversary in 
an anonymity system in [RR98], where they also proposed the anonymity 
protocol Crowds (see Section 1.3.1). In their work, they considered that the 
adversary can be an eavesdropper simply observing the traffic of messages on 
the network, or she can be an active attacker (i.e. a collaboration between 
senders, between receivers, or between others taking part in the system), or 
even a combination of the previous two types. The authors also defined a 
hierarchy of anonymity degrees that a system can provide. In decreasing order 
of strength, the proposed scale is listed below. In this list, let s,s' denote 
secrets and o an observable, i.e. a particular action or output of the system 
that is distinguishable from the point of view of the attacker. 

Strong anonymity From the attacker's point of view, the observables pro- 
duced by the system do not increase her knowledge about the secret 
information, i.e. the identity of the individual involved in an event. 
Chaum also described the concept of strong anonymity in his work on 
the Dining Cryptographers protocol [Cha88]. It represents the ideal sit- 
uation where the execution of the protocol does not give to the adversary 
any extra information about the secrets. The concept was formalized as 
follows. 

Vs,o p(s|o)=p(s) (1.4) 

This definition is the equivalent of "probabilistic noninterference". In 
[CP06], Chatzikokolakis and Palamidessi showed that the condition ex- 
pressed by (1.4) is equivalent to: 

Vs,s',o p{o\s) = p{o\s') (1.5) 

i.e. the probability of the system producing an observable is the same, 
no matter what the secret information is. This definition is known as 
equality of likelihoods and is advantageous as it does not depend on the 
probability distribution on secrets. 

Another definition of strong anonymity, more restrictive, was proposed 
by Halpern and O'Neill [HO03, HP05]. It is equivalent to each of the pre- 
vious definitions ((1.4) or (1.5)) plus the assumption that the input prob- 
ability is uniform. Halpern and O'Neill focused on the adversary's lack of 
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confidence in lier guess about tlie secret, and defined strong anonymity 
as: 

Vs,s',o p{s\o) = p{s'\o) (1-6) 

The formulation (1.6) is also known as conditional anonymity and cor- 
responds to the level of anonymity called beyond suspicion in Reiter and 
Rubin's classification. 

Beyond suspicion From the attacker's point of view, an agent is no more 
likely to be the culprit than any other agent in the system. It can be 
formalized as in (1.6). 

Probable innocence From the attacker's point of view, an agent does not 
appear more likely to be involved in an event than not to be involved. 
Formally: 

Vs,o p{s\o)< 0.5 (1.7) 

The formulation (1.7), however, is not broadly accepted as the defini- 
tion of probable innocence. In [CP06], Chatzikokolakis and Palamidessi 
showed that the property that Reiter and Rubin indeed proved for the 
Crowds protocol in [RR98] was: 

Vs,o p(o|s)<0.5 (1.8) 

Possible innocence From the attacker's point of view, there is always a non- 
zero probability that the agent involved in the event is someone else. 
Formally: 

Vs,o. {p{s\o) > ^ 3s'.p{s'\o) > O) 

The above hierarchy gives a richer classification of the degree of protection 
offered by a system than would be possible with simpler possibilistic models. 

Among the quantitative approaches to anonymity, two are of our special 
interest: the ones based on information-theoretic concepts and the ones based 
on the Bayes risk. In the following section we give a brief overview of these 
two approaches. These concepts will be revisited in more detail in Chapter 3. 

Anonymity protocols as noisy channels 

Information theoretic approaches to anonymity, and more generally to informa- 
tion flow, rely on concepts such as entropy and mutual information to measure 
the adversary's lack of information about the secret before and after observing 
the system's output. Typically the system is seen as a noisy channel and the 
concept of noninterference corresponds to the converse of the channel capacity. 

There are several works in the literature that have proposed measures of de- 
grees of anonymity in terms of the entropy and mutual information, for instance 
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[SD02, DSCP02, ZB05, DPW06]. In [CPPOSa] Chatzikokolakis, Palamidessi 
and Pananganden proposed the concept of conditional capacity to cope with 
the situation where some leakage of information is intended by the system. 
Consider again the election protocol example. By design, the final vote count- 
ing needs to be announced and it usually increases the attacker's knowledge 
about the secret. In this situation, the leakage should be calculated modulo the 
information that is supposed to be disclosed, i.e. the vote count. In this work 
the authors also proposed methods to calculate the channel capacity exploiting 
some symmetries present in several practical systems. 

Hypothesis testing and Bayes risk 

In some real world situations an individual faces the following situation: she is 
interested in the value of some random variable A € A but she has access only 
to the values of another random variable O £ O. She knows that A and O 
are correlated by a known conditional probability distribution. This situation 
occurs in several fields, for instance in medicine (to make a diagnosis, the 
physician has access to a list of symptoms, but not to the disease itself). The 
attempt to infer A from O is known as the problem of hypothesis testing. Here 
we are interested in the use of hypothesis testing in the context of anonymity 
(and information flow) . More specifically, the adversary tries to infer the secret 
A given that she has access to the observables O and she knows how the system 
works, i.e. how the probabilities of O are conditioned with relation to A. 

A commonly studied approach to the problem is based on the Bayesian 
method and consists of assuming the a priori probability distribution on A 
as known, and then deriving from that and from the knowledge about how 
the system works, an a posteriori probability distribution after some fact has 
been observed. It is well known that the best strategy for the adversary is 
to apply the MAP rule (Maximum A posteriori Probability rule), which as 
the name suggests, chooses the hypothesis with the maximum probability for 
the given observation. Here, by "best" strategy we mean the one that induces 
the smallest probability of error in guessing the hypothesis, that in this case 
corresponds to the Bayes risk. 

In [CPPOSb] Chatzikokolakis, Palamidessi and Pananganden explored the 
hypothesis testing approach to anonymity, in a scenario where the adversary 
has one single try to guess the secret (after exactly one observation). They 
associated the level of anonymity to the probability of error, i.e. the probability 
of an attacker making a wrong guess about the secret. In order to consider 
the worst case scenario and to give upper bounds for the level of anonymity 
provided, the adversary is assumed to use the MAP rule strategy. In this 
case, the probability of error corresponds to the Bayes risk, and the degree of 
protection offered by a protocol corresponds to the Bayes risk associated with 
the channel matrix. 
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In [Smi07, Smi09] Smith also considered the scenario of one-try attacks 
and proposed the notion of vulnerability, which takes into consideration the 
probabihty that the adversary can guess the secret correctly after observing 
the behavior of the system only once. Smith proposed the framework of min- 
entropy leakage, which is closely related to the Bayes risk, but is different as 
it uses the concept of entropy (more precisely min-entropy) and formalizes 
leakage in information theoretic terms. 

In Chapter 3 we will present a deeper discussion about the use of infor- 
mation theory for the formalization of information flow, including the notions 
of Shannon entropy, mutual information and the framework of min-entropy 
leakage for one-try attacks. First, however, we will review some fundamental 
anonymity protocols in literature. 

Examples of anonymity protocols 

On the Internet, every computer has a unique IP address which specifies the 
computer's logical location in the topology of the network. This IP address 
is usually sent along with any request originating from the computer. Even 
if the computer uses an IP address for a single session via an ISP (Internet 
Service Provider), the identification can be logged and retrieved later with the 
ISP's compliance. One common way to try to preserve anonymity is to use a 
proxy, i.e. an intermediary computer that gathers all the requests of a group 
of computers and serves as a unique gate for any communication with the 
world outside of the network. For practical purposes, it is as if all the requests 
originated from the proxy, and the members of the group are indistinguishable 
from the point of view of an outside observer. One drawback presented by 
the use of proxies is that it creates single points of failures, decreasing the 
network's robustness. 

The problem illustrated above is one of the motivations for the use of com- 
munication protocols specifically designed to protect anonymity. In this section 
we review two of the most fundamental, and probably most famous, examples 
of anonymity protocols in literature: the dining cryptographers protocol, and 
the Crowds protocol. 

The dining cryptographers The dining cryptographers protocol was pro- 
posed by Chaum in [Cha88]. It is one of the first anonymity protocols in the 
literature, and it is one of the few protocols that can assure strong anonymity. 

The protocol is usually presented in a simplified scenario, where three cryp- 
tographers employed by the NSA (The National Security Agency of the United 
States) are having dinner in a restaurant. At the end of the dinner, the NSA 
decides whether it will pay the bill itself or whether it will assign the duty of 
paying to one of the cryptographers at the table. In the case the NSA decides 
that one of the cryptographers will pay, it announces the decision secretly to 
the chosen one. The goal of the protocol is to reveal whether one cryptographer 
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will pay the bill or not, without revealing the identity of the payer. In other 
words, to an external observer (and to the non-paying cryptographers as well), 
the only accessible information is whether the NSA is paying or not, but not 
the identity of the cryptographer paying (if any). We assume that the NSA 
does not disclose its decision to anyone but to the cryptographer it chooses 
(again, if any), and that the solution should be distributed, i.e. only message 
passing between agents is allowed, and no centralized agent coordinates the 
process. 

The dining cryptographers protocol solves this problem as shown schemat- 
ically in Figure 1.1. Each cryptographer {Crypto ^ Crypt ^ and Crypt2) tosses a 
coin that is visible only to himself and to his right-hand neighbor. In this way 
every cryptographer has a shared coin with each of the other two. After all 
three coins (cq, ci and C2) are tossed, each cryptographer checks whether the 
two coins visible to him agree (both are heads or both are tails) or disagree 
(one is head and the other is tails). Then they announce publicly agree or 
disagree^ according to the result they obtained with their coins. The only ex- 
ception is that, if a cryptographer is paying, he will announce the opposite of 
what he sees, i.e. he will announce disagree in the case that his coins agree and 
agree if they do not. It can be proven that if the number of disagrees is even, 
then the NSA is paying, and if the number of disagrees is odd, then one of the 
cryptographers is paying. Moreover, if the coins are all fair, the protocol offers 
strong anonymity in the following sense: The execution of the protocol does 
not provide to an external observer enough evidence to change her knowledge 
about which cryptographer is the payer, if any. In other words the probability 
of any cryptographer being the payer, under the adversary's point of view, 
does not change after the observation of the protocol's execution. 

The dining cryptographers protocol can be generalized to any number of 
graph nodes (i.e. cryptographers) and any type of graph connectivity (i.e. the 
shared coins between pairs of cryptographers). Then the same solution can 
be used for anonymous communication as follows. Each pair of nodes share a 
common secret (the value of the coin) of length re, equal to the length of the 
transmitted data. It is assumed that the coins are drawn uniformly from the set 
of possible secrets. Each node then computes the binary sum (XOR operation) 
of all its shared secrets and announces the result. The only exception is that 
the node that wants to transmit adds the datum, also of length re, to the sum 
it announces. It can be shown that the total sum of the announcements of 
all nodes is equals to the data to be transmitted, since each secret is counted 
twice (once by each node that can see it) and, therefore, is canceled out by 
the XOR operation. The protocol works under the assumption that only one 
node at a time tries to transmit, and if it is the case that more than one sender 
wants to transmit at the same time, the conflict needs to be solved by some 
sort of coordinator. 

One drawback of the dining cryptographers protocol is its inefficiency: 
whenever a single node wants to transmit, all the nodes in the graph need 
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Figure 1.1: An example of the dining cryptographers protocol 

to collaborate to make it happen, at the cost of a large number of message 
exchanges. Moreover, as previously stated, in the case where more than one 
node wants to transmit at the same time, a coordinator is necessary to solve 
the conflict. 

Crowds The Crowds protocol was first presented in [RR98] and it allows 
Internet users to perform web transactions without revealing their identity. 
Usually, on the Internet, when a user communicates with a server the latter 
can discover the IP address of the originator. The idea behind Crowds is to 
gather users into a crowd and randomly redirect the request multiple times 
inside the group before finally letting it reach the server. In this situation, it 
is impossible for the server, and for any other user, to identify the initiator of 
the request once it receives the message: whenever someone sends a message 
there is a considerable probability that she is only a forwarder for someone 
else. 

To be more precise, a crowd is a group of m users who participate in the 
protocol. It is possible that a subgroup of c users are corrupted and collaborate 
to disclose the identity of the original sender. Also, we assume that the protocol 
has a parameter pf G (0, 1]. We call originator or initiator the user who wants 
to make a request to the server. The originator needs to create a path between 
herself and the server in order to have her request reach the final destination, 
as shown in Figure 1.2. 

The protocol works as follows: 

• At the first step the initiator chooses, according to a uniform probability 



16 



1.3. Case studies of information hiding 




Figure 1.2: The Crowds protocol at work 



distribution, another user in the crowd (possibly herself) and forwards 
the request to this user; 

• The user who receives the message then makes a random choice. With 
probability pf she forwards the message to the server, and with prob- 
ability 1 — pf she decides to forward the message to some user in the 
crowd. If this is the case, she chooses a user (possibly herself) according 
to a uniform probability distribution, and forwards the message to this 
user. This step is then repeated by the new message holder. 

The response from the server to the originator follows the same path, in 
the opposite direction. Moreover, all the communications in a path are en- 
crypted using a path key, which protects the path from threats posed by local 
eavesdroppers. Each user has access to the communications in which she par- 
ticipates, but it is assumed that a user cannot intercept messages exchanged 
between other users. It can be proven that the protocol is strongly anonymous 
with respect to the web server. Intuitively this is the case because at least one 
forward step is always performed, and after this step any user can be the holder 
of the message with equal probability. Therefore, from the server's point of 
view any user is equally likely to be the originator of the request. 

A more interesting case is to analyze the level of anonymity ensured with 
respect to a corrupted user. If in the very first step of the execution of the 
protocol the message is forwarded to a corrupted user, she can gain more 
information about the possible originator than the server. A user, whether the 
originator or not, is said to be detected if she sends a message to a corrupted 
user. Since the originator always appears in a path, she is more likely to be 
detected than the rest of the users. Detecting a user (at least for the first time 
in a path) increases the probability that this user is the originator. Therefore, 
strong anonymity cannot hold with relation to corrupted users. 

In [RR98] it is proven that if the number c of corrupted users is not too 
large, the protocol can at least ensure the level protection of probable inno- 
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cence. More precisely, if the number m of users in the crowd satisfies 



then the protocol ensures probable innocence in the sense of (1.8). 
1.3.2 Statistical disclosure control 

The field of statistical disclosure control concerns the problem of revealing ac- 
curate statistics about a set of respondents while preserving the privacy of 
individuals. In statistical databases, the data of a (large) number of par- 
ticipants is compiled, and users are allowed to pose statistical queries (such 
as average or total counting) about the sample. This kind of database is of 
special importance in many areas. For instance, medical databases can pro- 
vide information about how a disease spreads, and a census database can help 
authorities to decide how to spend the next year's budget. 

The data in a statistical database can be obtained in different ways. It can 
be collected in a census, for instance, it can be obtained opportunistically by 
monitoring the traffic in a network, or it can even be given by the participants 
by their own choice. No matter how the data is obtained, however, it is still 
important to ensure that the individual's participation in the database will 
not harm her privacy. This is not a trivial goal to achieve: the main purpose 
of a statistical database, in the ffist place, is to reveal some information about 
the population as a whole, i.e. to let users infer "general truths" about this 
population. As an example, suppose that a statistical database of individuals 
of a certain country indicates that, in this population, the life expectancy for 
women is 5 years longer than for men. Clearly this piece of information reveals 
something about the whole population, even about individuals not present in 
the database. 

There are several approaches to dealing with the problem of preserving 
privacy in statistical databases. One of them is based on ensuring large query 
sets, i.e. that no query can be posed for a small set of individuals. The 
problem with this approach is that, even if two query sets are "large enough", 
their combination may not be. Consider the following two queries: "How many 
people have disease y?" and "How many people, not named X, have disease 
y?". Both queries operate on large sets, but clearly the superposition of the two 
queries immediately reveals sensitive information about the individual named 
X. Another attempt to achieve privacy is based on the encryption of the data 
in the dataset. This is not a general solution since, as we have seen, the privacy 
threats do not concern only the individuals in the database and, therefore, the 
encryption of the data will not address this issue. 

Another possible solution is to apply some sort of query auditing: the 
curator of the database checks whether or not a query is possibly disclosing 
before deciding to provide an answer to it. This approach would cope with the 
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problem of the two superposing queries mentioned above, yet it presents two 
serious drawbacks: first, automatic tools to check every query are practically 
infeasible; and, second, the refusal to answer a query can be in itself a disclosing 
act. Another attempt to deal with the problem is by using subsampling of the 
dataset. We normally view a dataset as a collection of rows, where each row 
contains the data of an particular participant. The idea of subsampling is to 
randomly choose a subset of the rows, compute the answer to the query based 
on this subsample, and then report it as the final answer. If the subset is 
large enough, it should reflect the statistical properties of the whole database. 
This approach, however, protects a participant only to the extent to which 
it is unlikely that she is in the subsample. If being in the subsample has 
catastrophic results, then someone will always be seriously harmed. 

The input perturbation approach is based on modifying either the data 
or the query in hope of confusing the adversary. For instance, a randomized 
response mechanism can be used at the moment the data is acquired. This 
modification is permanent and not even the curator knows what the original 
data was. The queries to the database are then made taking into consideration 
the randomized noise. 

Yet another approach is to add randomized noise to the answer of the 
query. The idea is to compute the answer on the complete set of (the original) 
values in the database, and then randomize the response before reporting it 
to the user. If this is done naively, however, it can easily be taken care of 
by the adversary. Suppose that the noise is chosen to be a Gaussian additive 
noise with mean zero. If the query is repeated a sufficient number of times, 
a statistical analysis of the answers can easily estimate with high accuracy 
what the real answer is. Even if the curator of the database opts to record the 
query and always report the same answer for it, it may not solve the problem: 
syntactically different queries can be semantically equivalent, and if the query 
language is rich enough the semantic equivalence is undecidable. 

In this context, it is clear that the problem of statistical disclosure control 
is not trivial. Yet another issue to be considered is auxiliary (or side) infor- 
mation. Auxiliary information is any piece of data about individuals that the 
attacker has and that does not come from the database itself. It may originate 
from priors, beliefs, newspapers or even other databases. Some decades ago, 
Dalenius [Dal77] considered the problem of auxiliary information and proposed 
a famous "ad omnia" privacy desideratum: nothing about an individual should 
be learnable from the database that could not be learned without access to the 
database. In other words, if the adversary has some side information and gains 
some knowledge about the individuals using it, by learning the response from 
the database this knowledge about individuals should not increase. Dalenius' 
property is, however, too strong to be useful in practice: Dwork showed in 
[Dwo06] that no useful database can satisfy it. She then proposed the notion 
of differential privacy, which is based on the idea that the presence or absence 
of an individual in the database, or the individual's particular value, should 
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not significantly change the probabihty of obtaining a certain answer for a 
given query [Dwo06, DwolO, Dwoll, DL09]. 

The concept of differential privacy can be formalized as follows. Let X be 
the set of all possible databases, and Z be the set of possible answers to a 
query. Two databases x,x' £ X are adjacent (or neighbors), written x ~ x', if 
they differ in the value of exactly one individual. Then, for some e > 0: 

Definition 1 ([Dwoll]). A randomized function K, from X to Z satisfies e- 
differential privacy if for all pairs x,x' € X, with x ^ x' , and all S Z, we 
have: 

Pr[}C{x) eS]<e'- Pr[IC{x') G S] 

The concept of differential privacy has had an extraordinary impact in 
the database community, and we will discuss the meaning and implications 
of the above formulation in greater depth in Chapter 5. For the moment, it 
is enough to note that this definition intuitively ensures that individuals can 
opt in or out of the database without significantly changing the probability of 
any given answer to a query to be reported. In other words, it is "safe" for an 
individual to join (or to leave) the database. Dwork also showed that in order 
to ensure differential privacy it is enough to consider a Laplacian mechanism 
of noise [Dwo06]. 

Although differential privacy is a promising approach to the question of 
statistical disclosure control, the fact that it relies on the randomization of the 
query response poses some challenges with respect to the utility of the query 
mechanism. If the noise is not added with sufficient care, the reported answer 
can be so "different" from the real answer that the informative purpose of the 
database is compromised. In Chapter 5 we will come back to the question 
of how to apply differential privacy and, at the same time, provide maximum 
utility to the query mechanism. 

1.3.3 Refining specifications into implementations 

Deriving implementations of a system given its specification, while respecting 
security constraints, is a challenging problem in information hiding and, more 
generally, in security. A specification S is refined by an implementation P if 
P preserves all logically expressible properties of S. One needs to be care- 
ful, however, when refining a specification in the realm of information hiding. 
According to Morgan [Mor09]: 

A rigorous definition of how specifications relate to implementa- 
tions, as part of reasoning, must ensure that implementations re- 
veal no more than their specifications: they must, in effect, preserve 
ignorance. 
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By "ignorance", the author means what the user does not know about what 
she cannot see. This notion is closely related to the problem of information 
flow, i.e. determining how much about the secret behavior of a system an 
adversary can infer from an observation and her knowledge about how the 
system works. 

To illustrate the problem, we will discuss the following example, adapted 
from the original one in [Mor09]. Consider a partition of the program states 
into visible {v) and hidden {h). Assume that the two variables v and h have 
the same domain N (the natural numbers), and in a specification S, after the 
value of h is assigned, the following is stated: choose v from the domain N. 
Then we can ask "from the final value of f , what can the observer deduce about 
the value of /i, given that she knows how the system works?". Of course the 
answer will depend on how the implementation / of the specification is done. 
If I is simply v := 0, then nothing is learned, since what the user knows about 
the value of h is exactly what she already knew before. If the implementation 
\s V := h mod 2, then she can learn Ks parity. If the implementation is v := h, 
then she learns the exact value of h. Intuitively, the three implementations are 
in increasing order according to the loss of ignorance they induce. 

It is desirable that the implementation of a specification be "ignorance 
preserving", in the sense that the implementation should not reveal more about 
the secrets than the specification does. Some works in the literature suggest 
that one should be careful when dealing with secure refinements if one wants to 
preserve information-flow security properties. In [Jac89], for instance, Jacob 
shows that even if an implementation is a consistent refinement with respect to 
a specification, it does not imply that the (information-flow) security properties 
of the specification are preserved in the implementation. 

As pointed out in [CNP09], nondeterminism is often used in system specifi- 
cations as a way of abstracting from implementation details (such as scheduler 
policy). Implementations are obtained from specifications by refinement alge- 
bras, which reduce nondeterminism. As we have seen in a previous example, if 
we assume v and h are both of type N, then the specification choose v from the 
domain N can be refined to v := h, which is simply a reduction of nondeter- 
minism. This is known as the "refinement paradox" [Mor09], because it does 
not preserve ignorance. While the specification does not tell anything about 
the value of h, the refinement completely reveals it. 

The process of reducing nondeterminism by refinements is related to the 
notion of schedulers in nondeterministic systems: designing an implementation 
of a specification involves choosing a scheduler to solve all the nondeterminism 
of the specification. The scheduler is indeed a final result of the refinement 
process, after all the nondeterminism is ruled out. 

According to this perspective, similar concerns about refinement algebras 
should be taken into consideration when dealing with schedulers. Indeed, it 
can be shown that, given a specification S and a scheduler that leads to a 
consistent implementation P with respect to S, it is not guaranteed that the 
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security properties of S are preserved in P. 

In the domain of refinement of specifications, the sohition proposed in 
[Mor09] is to apply some principles to the refinement algebra in order to assure 
the preservation of ignorance. These principles restrict the refinement relation, 
eliminating the cases that do not preserve ignorance. 

A similar problem arises in the context of concurrent systems, where the 
scheduler that resolves the nondeterminism can violate security properties. 
In Chapter 6 we focus on this problem and we propose restrictions on the 
schedulers that also lead to ignorance-preserving refinements. 

1.4 Plan of the thesis and contribution 

In Chapter 2 we review some basic notions necessary for the development 
of this thesis, including the concepts of probability spaces, probabilistic au- 
tomata and CCSp (a probabilistic version of the process algebra of concurrent 
communicating processes). 

In Chapter 3 we review the main approaches that have been considered 
to quantify the notion of information leakage using concepts of information 
theory. We explain concepts such as entropy, conditional entropy, mutual 
information and capacity. We focus on how distinct notions of entropy can 
model attackers with different levels of power, and we introduce the mathe- 
matical background necessary for most of this thesis. Finally we compare the 
main notions of uncertainty and leakage in the literature. 

In Chapter 4 we consider the problem of defining the information leakage 
in interactive systems where secrets and observables can alternate during the 
computation. We show that the information-theoretic approach that interprets 
such systems as classic channels is not valid. The principle can be recovered, 
however, if we consider channels of a more complicated kind, namely channels 
with memory and feedback. We show that there is a complete correspondence 
between interactive systems and such channels. We also propose the use of 
directed information, as opposed to mutual information, to represent leakage 
in interactive systems. This proposal is based on recent results in information 
theory that have shown that, in channels with memory and feedback, the 
transmission rate does not correspond to the maximum mutual information 
(the standard notion of capacity), but rather to the maximum (normalized) 
directed information. We show that our model is a proper extension of the 
classical one, i.e. in the absence of interactivity the model of channels with 
memory and feedback collapses into the model of memoryless channels without 
feedback. Finally, we show that the capacity of the channels associated with 
interactive systems is a continuous function with respect to a pseudometric 
based on the Kantorovich metric. 

In Chapter 5 we analyze critically the notion of differential privacy in the 
light of the conceptual framework provided by min-entropy leakage. We show 



22 



1.5. Publications 



that there is a close relationship between differential privacy and leakage, due 
to the graph symmetries induced by the adjacency relation on databases. Fur- 
thermore, we consider the utility of the randomized answer, which measures 
its expected degree of accuracy. We focus on certain kinds of utility functions 
called "binary", which have a close correspondence with the notion of min- 
entropy leakage and the Bayes risk. Again, there can be a tight correspondence 
between differential privacy and utility, depending on the symmetries induced 
by the adjacency relation and by the query. Using these symmetries we can, 
in some cases, build an optimal-utility randomization mechanism while pre- 
serving the required level of differential privacy. We also provide a study of 
the kind of structures that can be induced by the adjacency relation and the 
query, and how to use them to derive bounds on the leakage and achieve the 
optimal utility. 

In Chapter 6 we move away from the quantitative realm and focus on the 
problem of nondeterminism in systems specifications. In the field of security, 
process equivalences have been used to characterize various information-hiding 
properties (for instance secrecy, anonymity and noninterference) based on the 
principle that a protocol P with a variable x satisfies such a property if and 
only if, for every pair of secrets si and S2, P[^^ /x] is equivalent to P[^^ /x]- We 
argue that, in the presence of nondeterminism, the above principle relies on the 
assumption that the scheduler "works for the benefit of the protocol", and this 
is usually not a safe assumption. Non-safe equivalences, in this sense, include 
complete-trace equivalence and bisimulation. We present a formalism in which 
we can specify admissible schedulers and, correspondingly, safe versions of 
these equivalences. We prove that safe bisimulation is still a congruence. Then 
we show that safe equivalences can be used to establish information-hiding 
properties. 

Finally, in Chapter 7 we make our final observations. 

1.5 Publications 

Most of the results in this thesis have already been the subject of scientific 
publications. More precisely: 

• Chapter 3 is based on the paper Probabilistic Information Flow 

[AAPlOb] that appeared in the proceedings of 25*^ Annual IEEE Sym- 
posium on Logic in Computer Science (LICS 2010). 

• Chapter 4 is based on the papers: 

— Information Flow in Interactive Systems [AAPlOa] that ap- 
peared in the proceedings of the 21^** International Conference on 
Concurrency Theory (CONCUR 2010); 
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— Quantitative Information Flow in Interactive Systems [AAPll] 
to appear in the Journal of Computer Security. 

• Chapter 5 is based on two complementary works: 

— The paper On the relation between Differential Privacy and 
Quantitative Information Flow [AACPll] to appear in the pro- 
ceedings of the 38th International Colloquium on Automata, Lan- 
guages and Programming (ICALP 2011); 

— The technical report Differential Privacy: on the trade-off 
between Utility and Information Leakage [AAC+11]. 

• Chapter 6 is based on the paper Safe Equivalences for Security 
Properties [AAPvRlO] that appeared in the the proceedings of the 6th 
IFIP International Conference on Theoretical Computer Science (IFIP- 
TCS 2010). 
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"/ can make just such ones if I had tools, and I could make tools 

if I had tools to make them with. " 

Eli Whitney 

In this chapter we review some technical concepts from the literature that 
will be used throughout this thesis. 

2.1 Probability spaces 

In this section we recall some concepts about probability spaces. 

Let be a set and 'P(il) represent its powerset, i.e. the collection of all 
subsets of fi. A a-algehra (also called a-field) over is a non-empty collection 
of sets J-" C that is closed under complementation and countable union. 

For any u-field J-, the property Q € holds, and also that J-" is closed under 
countable intersection (by De Morgan's laws). 

A (positive) measure on is a function fi : T ^ [0, oo) such that 

1. /i(0) = 0, and 

2. /^(UiC'j) = ^if^{Ci), where {Ci}i is a countable collection of pairwise 
disjoint sets in J^. 

A probability measure on J-" is a measure fi on T such that = 1. 

A probability space is a tuple (Q, T ^ fj,) where is a non-empty set called 
the sample space, is a tr-algebra on Q called the event space, and fi is a 
probability measure on J^. In the discrete case, we have 
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In this case we can construct fj, from a function p : Q [0, 1] satisfy- 
ing '^x£fiP(-'^) = 1 t)y assigning = p{x). The function p is called a 
probability distribution over ri. 

The set of all probability measures with sample space Q will be denoted 
by V{^n). We will also denote by 5x{-) (called the Dirac measure on x or also 
a point mass) the probability distribution such that n{{x}) = 1. 

If A and are events, i.e. elements of a cj-field then A fl i? is also an 
event. If ij.{A) > then we can define the conditional probability p{B\A) as 

representing the probability of B given that A holds. Note that p{-\A) is a new 
probability measure on T. For the scope of this thesis we are interested only 
in the discrete case, so it is enough to use the definition above and make sure 
that we never condition on an event A with zero probability. 

Let T^T' be two cr-fields on Q' respectively. A random variable X is 
a function X : Q W that is measurable, meaning that the inverse of every 
element of J^' belongs to J-": 

VCeJ"'. X^\C)eJ' 

Then, given a probability measure fion J^, X induces a probability measure 
li' on T' as 

VCG-F'. fi'{C) = fi{X^\C)) 

If fi' is a discrete probability measure then it can be constructed by a 
probability distribution over 17', called probability mass function (pmf), defined 
as 

P{[X = x]) = ^^{X-Hx)) 

for each x G O'. The random variable in this case is called discrete. If X, Y 
are discrete random variables then we can define a discrete random variable 
{X, Y) by its pmf 

P{[X = x,Y = y]) = fi{X-\x) n X-\y)) 

If X is a real-valued discrete random variable then its expected value (or 
expectation) is defined as 

E{X)=J2xiP{[X = x^]) 

i 

A family p = {pvi')}v of probability measures parametrized on v (where v 
can range over {0, . . . ,n} for some natural n) is called a stochastic kernel}. 



^The general definition of stociiastic kernel is more complicated (cfr. [TM09]), but it 
reduces to this one in the discrete case, which is what we use in this thesis. 
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Notation: We will use capital letters A,B,X,Y,Z to denote random vari- 
ables and calligraphic letters A, B, X, y, Z to denote their image. With a slight 
abuse of notation we will use -p (and p{x),p{y)) to denote either 

• a probability distribution, when x,y £ Q, or 

• a probability measure, when x,y (z J- are events, or 

• the probability mass function P{[X = x]),P{[Y = y]) of the random 
variables X,Y respectively, when x € <-f,y G 3^. 

2.2 Probabilistic automata 

Let 5 — 7- [0, 1] be a discrete probability distribution on a countable set S, 
and let the set of all discrete probability distributions on S be T>{S). 

A probabilistic automaton [Seg95] is a quadruple M = {S,C,s,-d) where 
iS is a countable set of states, £ is a finite set of labels or actions, s is the 
initial state, and is a transition function : 5 — ?> V{T>{C x S)). If = 
then s is a terminal state. We write s^/i for /_f G i9(s), s G 5. Moreover, we 
write s— T-r for s,r £ S whenever s— t-// and r) > 0. A fully probabilistic 
automaton is a probabilistic automaton satisfying 119(5)1 < 1 for all states. In 
such an automaton, when 'd[s) ^ 0, we overload the notation and denote by 
i9(s) the distribution outgoing from s. 

A path in a probabilistic ^automaton is a sequence a = sq —> si ■ ■ ■ 
where Si € S, ii G £ and Si ^^Sj+i. A path can be finite in which case it 
ends with a state. A path is complete if it is either infinite, or finite ending 
in a terminal state. Given a finite path a, last{a) denotes its last state. Let 
PathSs(M) denote the set of all paths, Paths*s(-^) the set of all finite paths, 
and CPathss(M) the set of all complete paths of an automaton M, starting 
from the state s. We will omit s if s = s. Paths are ordered by the prefix 
relation, which we denote by <. The trace of a path is the sequence of actions 
in £* U obtained by removing the states, hence for the above a we have 
trace{a) = I1I2 ■ ■ ■■ If >C' C £, then trace c'if^) is the projection of trace{a) on 
the elements of C . 

Let M = (S,C,s,'d) be a (fully) probabilistic automaton, s G 5 a state, 
and let a G Pathss(M) be a finite path starting in s. The cone generated by 
a is the set of complete paths (a) = {a' G CPathSs(M) | a < cr'}. Given a 
fully probabilistic automaton M = (5, C, s, ??) and a state s, we can calculate 
the probability value Ps(cr) of any finite path cr starting in s as follows: 

Ps(s) = 1, and 
Ps(cr A s') = P<j(o") fi{£,s') where last{a) — >■ 
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Let $7^ = CPathss(Af) be the sample space, and let J^g be the smallest 
cT-algebra induced by the cones generated by all the finite paths of M. Then 
P induces a unique probability measure on J-g (which we will also denote by 
Pg) such that Ps(((t)) = Ps(cr) for every finite path a starting in s. For s = s 
we write P instead of P^. 

A (total) scheduler for a probabilistic automaton M is a function defined 
as C: Paths*(M) ^ {C x T>{S) U {±}) such that for all finite paths a, if 
'&{last{a)) 7^ then (^(o") € '&{last{a)), and C(c) = -L otherwise. Hence, a 
scheduler selects one of the available transitions in each state, and determines 
therefore a fully probabilistic automaton, obtained by pruning from M the 
alternatives that are not chosen hy ^. A scheduler is history dependent since 
it takes into account the path and not only the current state. It is possible to 
define partial schedulers, i.e. schedulers that may halt the execution at any 
time. In this thesis, however, we will consider only total schedulers, to be more 
in line with the standard semantics of CCS. 



2.3 CCS with internal probabilistic choice 

In this section we present an extension of standard CCS ([Mil89]) obtained 
by adding internal probabilistic choice. The resulting calculus can be seen as 
a simplified version of the probabilistic vr-calculus presented in [HPOO, PH05] 
and it is similar to the one considered in [DPP05]. The restriction to CCS and 
to internal choice is suitable for the scope of this thesis. 

Let a range over a countable set of channel names. 

The syntax of CCSp is the following: 



a ::= 


a \ a \ T 


prefixes 






processes 




a.P 


prefix 




P 1 Q 


parallel 




P + Q 


nondeterministic choice 




PiPi 


internal probabilistic choice 




{va)P 


restriction 




\P 


replication 







nil 



where the piS in the probabilistic choice should be non-negative and their sum 
should be 1. We will also use the notation Pi +p P2 to represent a binary sum 
Yj- piPi with pi=p and p2 = I - p. 

The semantics of a CCSp term is a probabilistic automaton defined induc- 
tively on the basis of the syntax according to the rules in Figure 2.1. We write 
s — ^ ^ when (s, a, /i) is a transition of the probabilistic automaton. Given 
a process Q and a measure jx, we denote by ^ | Q the measure such that 
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ACT 



SUMl 



PARI 



COM 



REPl 



a.P ^ 5{P) 

P — ^ M 

P^5{P') Q^5{Q') 
P\Q^ 5{P' j Q') 

P — > ^^ 



RES 



SUM2 



PAR2 



PROB 



REP2 



P /i g 7^ o, g 
{ua)P {ua)ii 

Q — ^ M 



!P ^/i I !P 

Figure 2.1: The semantics of CCSp 



P + Q^li 

Q — ^ 
P\Q^P\p. 

E^P^P^^E^P^m) 
P djPi) P ^ 6{P2) 

IP ^ 6{Pi I P2 I !P) 



n'{P j Q) = IJ-iP) for all processes P and ^'{R) = if i? is not of the form 
P I Q. Similarly (i/a)/i = ji' such that jj,'{{ua)P) = /^(-P). 

A transition of the form P — ^ ^{P'), i-e. a transition having for target a 
Dirac measure, corresponds to a transition of a non-probabilistic automaton (a 
standard labeled transition system). Note that each rule of CCSp corresponds 
to one rule of CCS, except for PROB. The latter models the internal prob- 
abilistic choice: a silent r transition is available from the sum to a measure 
containing all of its operands, with the corresponding probabilities. 

Note that in the produced probabilistic automaton, all transitions to non- 
Dirac measures are silent. This is similar to the alternating model [HJ89], 
however our case is more general because the silent and non-silent transitions 
are not necessarily alternated. On the other hand, with respect to the simple 
probabilistic automata the fact that the probabilistic transitions are silent 
looks like a restriction. It has been proved by Bandini and Segala [BSOl], 
however, that the simple probabilistic automata and the alternating model are 
essentially equivalent, so, being in between, our model is equivalent as well. 

Encoding message passing into CCSp Sometimes it is convenient to 
make message passing explicit in the notation of CCSp. Namely, we enrich 
its syntax by allowing the prefixes to be c(a) | c{x) \ r, where c, a, x are 
names, and the semantic rule COM is substituted by: 



COM' 



p'J^SjP') q'-H5{Q') 

p\Q^5{p'\Q' m) 



where P S{P') denotes a process that sends the name a through channel 
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c and then evolves to P', and Q ^{Q') denotes a process that receives the 
name x through channel c and then evolves to Q' . Here Q' \^ / x] is the process 
Q' in which every occurrence of x is replace by a. 

The expressive power of CCSj, with message passing and without it is the 
same [Mil89]. In this thesis we will use this fact and consider explicit message 
passing as an alias for the corresponding encoding into the presentation of 
CCSp given in Figure 2.1. 
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Three 



The rationale behind the use of 
information theory for leakage 



"Why, only why?" 
Nadia Vertti 

In this chapter we review the most important concepts related to the informa- 
tion theoretic approach to quantitative information flow. We aim at presenting 
these concepts in a contextualized way, discussing the intuition behind them 
and interpreting what they mean in terms of security. 

Plan of the Chapter Section 3.1 gives a brief overview on information 
theory for communication. Section 3.2 introduces the information theoretic 
approach to information flow. Section 3.3 presents and compares several dif- 
ferent notions based on information theory that have been used in the literature 
to characterize uncertainty and leakage. 

3.1 Information theory and communication 

The study of information theory started with Claude E. Shannon's work on the 
problem of coding messages to be transmitted through unreliable (or noisy) 
channels. A communication channel is a (physical) means through which in- 
formation can be transmitted. The input is fed into the channel, but due to 
noise or any other problems that can occur during the transmission, the output 
of the channel may not reflect with fidelity the input. It is usual to describe 
the unreliable behavior of the channel in a probabilistic way. In the discrete 
(finite) case, if ^ = {ai, a2, . . . , an} represent the possible inputs for the chan- 
nel, and B = {bi,b2, ■ ■ ■ ,bm} represent the possible outputs, the channel's 
probabilistic behavior can be represented as a channel matrix Mnxm where 
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each element Mij (1 < i < n, 1 < j < m) is defined as the probability of 
the channel outputting bj when the input is a,. In this way, we can see the 
input and output as two correlated random variables linked by the channel's 
probabilistic behavior ^. 

A unique feature of information theory is its use of a numerical measure of 
the amount of information gained when the contents of a message are learned. 
More specifically, information theory reasons about the degree of uncertainty 
of a certain random variable, and the amount of information that it can reveal 
about another random variable. Among the tools provided by information 
theory there are concepts as entropy, conditional entropy, mutual information 
and channel capacity, which will be reviewed in Section 3.3.1. We consider 
here only the discrete case, since this is enough for the scope of this thesis. 

3.2 Information theory and information flow 

Several works in the literature use an information theoretic approach to model 
the problem of information flow and define the leakage in a quantitative way, as 
for example [ZB05, CHM05, MalOT, MC08, MNS03, MNCM03, CPPOSa]. The 
idea is to model the computational system as an information theoretic channel. 
The input represents the secret, the output represents the observable, and the 
correlation between the input and output {mutual information) represents the 
information leakage. The worst case leakage corresponds then to the capacity 
of the channel, which is by definition the maximum mutual information that 
can be obtained by varying the input distribution. 

In the works mentioned above, the notion of mutual information is based 
on Shannon entropy, which (because of its mathematical properties) is the 
most established measure of uncertainty. From the security point of view, this 
measure corresponds to a particular model of attack and a particular way of 
estimating the security threat (vulnerability of the secret). Other notions have 
been considered, and argued to be more appropriate for security in certain sce- 
narios. These include: min-entropy [R61, Smi09], Bayes risk [CT91, CPPOSb], 
guessing entropy [Mas94], and marginal guesswork [PliOO]. In Section 3.3 we 
will discuss their meaning and show how they relate (or do not relate) to each 
other and to Shannon entropy. 

Whatever definition of uncertainty (i.e. vulnerability) we want to adopt, 
the notion of leakage is inherent to the system and can be expressed in a 
uniform way as the difference between the initial uncertainty, i.e. the degree 
of ignorance about the secret before we run the system, and the remaining 
uncertainty, i.e. the degree of ignorance about the secret after we run the 
system and observe its outcome. Following the principle advocated by Smith 



^Note that we are assuming that channels are loseless, since the rows are probabiHty 
distributions instead of sub-probabihty distributions. 
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[Smi09], and by many others: 

information leakage = initial uncertainty (3.1) 

remaining uncertainty 

In (3.1), the initial uncertainty depends solely on the input distribution, 
aka the a priori distribution or prior. Intuitively, the more uniform it is, the 
less we know about the secret (in the probabilistic sense). After we run the 
system, if there is a probabilistic correlation between input and output, then 
the observation of the output should increase our knowledge of the secret. This 
is determined by the fact that the distribution on the input changes: in fact we 
can update the probability of each input with the corresponding conditional 
probability of the same input, given the output. The new distribution is called 
the a posteriori distribution. In case the input and output are independent, 
then the a priori and the a posteriori distributions coincide and the knowledge 
should remain the same. We will use the attributes "a priori" (or "prior") 
and "a posteriori" to refer to before and after the observation of the output, 
respectively. 

The above intuitions should be reflected by any reasonable notion of un- 
certainty: it should be higher on more uniform distributions, and it should 
decrease or remain equal with the observation of related events. 

If the uncertainty is expressed in terms of Shannon entropy, then the initial 
uncertainty is the entropy of the input, the remaining uncertainty is the condi- 
tional entropy of the input given the output, and (3.1) matches exactly the defi- 
nition of mutual information. This justifies the notion of leakage adopted in the 
works mentioned before ([ZB05, CHM05, Mal07, MC08, MNS03, MNCM03, 
CPPOSa]). 

The analogy between information flow in a system and a (simple) channel 
works well when: 

(i) there is no nondeterminism, i.e. either the system is deterministic, or 
purely probabilistic; and 

(ii) there is a precise temporal relation between secrets and observables in the 
computations; namely, the value of the secret is chosen at the beginning 
of the computation, and the computation of the system produces an 
observable outcome with a probability that depends solely on the chosen 
input and on the system. Furthermore, each new run of the system is 
independent from the previous ones. 

Restriction (i) implies that for each secret there is exactly one conditional 
probability distribution on the observables, where the condition is the secret 
value. If a system is deterministic, then under the same input each run pro- 
duces always the same output, with probability 1. Therefore the matrix con- 
tains only O's and I's. Yet the problem of inferring the secret is interesting. 
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because the same output may correspond to different inputs. If tlie system 
is probabilistic, i.e. it uses some randomized mechanisms, then the matrix 
usually contains probabilities different from and 1. 

Restriction (ii) ensures that this conditional distribution depends uniquely 
on the system (not on the input distribution). These conditional probabilities 
constitute the channel matrix. Note that in a (basic) information-theoretic 
channel the matrix must be invariant with respect to the input distribution, 
which is exactly what condition (ii) guarantees. 

Unfortunately, usually conditions (i) and (ii) are too restrictive for real-life 
systems: 

• Specifications typically need to use nondeterminism in order to abstract 
from implementation details. This is particularly compelling in the case 
of concurrent and distributed systems: The order in which the various 
components get executed and their interactions depend on scheduling 
policies that may differ from implementation to implementation. Fur- 
thermore, even if the scheduling policy is fixed, there are run time cir- 
cumstances that may influence the relative speed of the processes. Non- 
determinism is, in practice, an unavoidable aspect of concurrency. 

• Secrets and observables often alternate and interact during an execu- 
tion. In particular, the choice of a new secret may depend on previous 
observables. Furthermore, new executions of the systems may depend 
on previous ones. This may be due to the way the system works, or to 
the presence of an active adversary that may use the knowledge derived 
from previous observations to try to tamper with the mechanisms of the 
system, with the purpose of increasing the leakage. Examples of such 
systems, that we call here interactive systems (where interaction refers 
to the interplay between secrets and observables), can be found in the 
areas of game theory, auction protocols, web servers, GUI applications, 
etc. 

In this thesis we consider the challenges of extending the information- 
theoretic approach to cases where these conditions are relaxed. More specifi- 
cally. Chapter 4 concerns the suppression of condition (ii) , and Chapter 6 deals 
with the suppression of condition (i). 

3.3 Uncertainty and leakage 

In this section we recall various definitions of uncertainty based on information 
theory proposed in the literature, and we discuss the relation with security 
attacks and the way of measuring their success. In general we consider the 
kind of threats that in the model of Kopf and Basin [KB07] are called brute- 
force guessing attacks, which can be summarized as follows: The goal of the 
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adversary is to determine the value of a random variable. He can make a series 
of queries to an oracle. Each query must have a yes j no answer. In general 
the adversary is adaptive, i.e. he can choose the next query depending on the 
answers to the previous ones. We assume that the adversary knows the a priori 
probability distribution. In this section, when we talk about the meaning in 
security of a particular measure of uncertainty, we refer to the work in [KB07]. 

In the following. A, B denote two discrete random variables with finitely 
many values A = {a-^, . . . , a„}, B = {h^^, . . . , b^}, and probability distributions 
Pb{')i respectively. We will use ^4 A i? to represent the random variable 
with carrier A'x B and joint probability distribution PAhB{0', b) = PAid) ■ pib \ 
A = a), while A ■ B will denote the random variable with carrier Ax B and 
probability distribution defined as product, i.e. = Pa{cl) ■ pB{b). 

Clearly, if A and B are independent, we have A ^ B = A ■ B. We shall omit 
the subscripts on the probabilities when they are clear from the context. In 
reference to a channel, in general A will denote the input (secret), and B the 
output (observable). 

3.3.1 Shannon entropy 

The (Shannon) entropy of A is defined as 

H{A) = -^p(a)logp(a) 
A 

The entropy measures the uncertainty of A. It takes its minimum value 
H{A) = when pa{') is a point mass (also called delta of Dirac). The maxi- 
mum value H{A) = log \A\ is obtained when pa{ ) is the uniform distribution. 
Usually the base of the logarithm is set to be 2 and the entropy is measured 
in bits. Roughly speaking, m bits of entropy means that we have 2*" values to 
choose from, assuming a uniform distribution. 

The conditional entropy of A given B is defined as 

H{A\B) = Y.p{b)H{A\B = b) (3 2) 

where 

H{A \B = b) = p{a\b) log p{a\b) 

a£A 

The conditional entropy measures the uncertainty of A when B is known. It 
is well-known that < H{A\B) < H{A). The minimum value, 0, is obtained 
when A is completely determined by B. The maximum value H(A) is obtained 
when A and B are independent. 

The mutual information between A and B is defined as 

I{A;B) = H{A) - H{A\B) (3.3) 
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The mutual information measures the amount of information about A that 
we gain by observing B. It can be shown that I{A;B) = I(B;A) and < 
I{A]B) < H{A). If C is a third random variable, the conditional mutual 
information between A and B given C is defined as 

I{A; B\C) = H{A\C) - H{A\B, C) 

The (conditional) entropy and mutual information respect the chain rules. 
Namely, given the random variables Ai, A2, ■ ■ ■ , Ak, B and C, we have: 

k 

HiAi,A2, ...,Ak\C) = Y, H{A\Ai, Ai_i, C) 

4 = 1 

k 

I{Al,A2,...,Ak■,B\C) = Y,I{A^■,B\A,,...,Ai_uC) (3.4) 

4 = 1 

A discrete memoryless channel is a tuple {A, B,p{-\-)), where A,B are the 
sets of input and output symbols, respectively, and p{b\a) is the probability of 
observing the output symbol b when the input symbol is a. These conditional 
probabilities constitute the channel matrix. An input distribution pa{') over 
A together with the channel determine the joint distribution p{a, b) = p{a\b) ■ 
p{a) and consequently I {A; B). The maximum I {A; B) over all possible input 
distributions is the channel's capacity C: 

C = maxI(A;B) 
pa{-) 

The famous Channel Coding Theorem by Shannon relates the capacity of 
the channel to its maximum transmission rate. In brief, the channel capacity 
is a tight upper bound for the maximum rate by which information can be 
reliably transmitted using the channel. Given an acceptable probability of 
error ^, there is a natural number n and a coding for which n uses of the 
channel will result in messages being transmitted with at most the acceptable 
probability of error ^. 

Meaning in security To explain what H{A) represents from the security 
point of view, consider a partition {Ai}i£i of A. The adversary is allowed to 
ask questions of the form "does A G Ai?" according to some strategy. Let 
n(a) be the number of questions that are needed to determine the value of a, 
when A = a. Then H{A) represents the lower bound to the expected value 
of n(-), with respect to all possible partitions and strategies of the adversary 
[PhOO, KB07]. 
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3.3.2 Min-entropy 

In [R61], Renyi introduced a one-parameter family of entropy measures, in- 
tended as a generalization of Shannon entropy. The Renyi entropy of order a 
(a > 0, a 7^ 1) of a random variable A is defined as 



HM) = -^logVp(a) 

1 — Q 



Renyi's motivations were of an axiomatic nature: Shannon entropy satisfies 
four axioms, namely symmetry, continuity, value 1 on the Bernoulli uniform 
distribution, and the chain rule^: 

H{AAB) = H{A)+H{B\A) (3.5) 

The entropy of the joint probability, H{AaB), is more commonly denoted 
by H{A,B). We will use the latter notation in the following. 

Shannon entropy is also the only function that satisfies those axioms. If 
we replace, however, (3.5) with a weaker property representing the additivity 
of entropy for independent distributions: 

H{A-B) = H{A) + H{B) 

then there are more functions satisfying the axioms, among which are all those 
of Renyi's family. 

Shannon entropy is obtained by taking the limit of Ha as a approaches 1. 
In fact we can easily prove, using rHopital's rule, that 



Hi{A) = lim Ha{A) = - Vp(a)logp(c 



We are particularly interested in the limit of Ha as a approaches oo. This 
is called min-entropy. It can easily be proven that 

Hoo{A) =^ lim Ha{A) = — logmaxp(a) 

Renyi considered also the a-generalization of the Kullback-Liebler diver- 
gence, which is defined as (assuming that p and q are distributions on the same 
set X): 

DKL{p\\q) = Vp(x)log-— 



■^The original axiom, called the grouping axiom, does not mention the conditional en- 
tropy. It corresponds, however, to the chain rule if the conditional entropy is defined as in 
(3.2). 
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Renyi's a-generalization is: 



Daip\\q) = -^log Vp(xrgC 
I — a ^-^ 



The standard case, i.e. the Kullback-Liebler divergence, is again obtained 
by taking the hmit of Da as a — t- 1. 

The interest of the above for our purposes hes on the fact that Shannon mu- 
tual information can equivalently be defined in terms of the Kullback-Liebler 
divergence (see for instance [CT91]): 

I{A;B) = Dkl{AAB \\ A ■ B) 

Therefore, it seems natural to define the a-generalization of the mutual 
information as: 

C{A;B) = Da{AAB\\A-B) 

Other a-generalizations of the mutual information, based on the same idea, 
are explored in [Csi95]. 

As a ^ oo, the above definition gives the following min- version of the 
mutual information: 

I*^{A;B) ""^^ hm Ia{A;B) = log max f^"'^^ (3.6) 

a^oo a,b p[a) p[b) 

Another natural way to generalize I{A; B) would be to replace H by Ha 
in Definition 3.3. Renyi did not define, however, the a-generalization of the 
conditional entropy, and there is no agreement on what it should be. 

Various researchers, including Cachin [Cac97], have considered the follow- 
ing definition, based on (3.2): 

H'^achin^ji^ I 5) ^ ^ ^(5) ^^(^ \B = h) 

which, as a — )• oo, becomes 

jjCachin^j^^ I ^) ^ -^p{b) log maxp(a | b) ^3 7^ 

An alternative proposal for i?oo(' | ") came from Smith [SmiOD]'^: 

fjSnrUh j ^) ^ _ va^^^^j^ p{a, b) (3.8) 

Using (3.7) and 3.8), and the analogue of (3.3) we can define /^"^^fcin 

jSmith 4 
-'00 



^The same formulation had been already used by Dodis et al. in [DORS04], and Smith 
proposed it independently. Since it is Smith's work on the subject that motivates the 
approach used in this thesis, we opt to refer to this formulation as Smith's. 

"'The notation J^'"' is ours. Smith himself opts for not adopting it, since J^'*'' is not 
symmetric. 
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Meaning in security The min-entropy can be related to a model of adver- 
sary who is allowed to ask exactly one question, which must be of the form "is 
A = a?" (one-try attacks). More precisely, the min-entropy Hoq{A) represents 
the (logarithm of the inverse of the) probability of success for this kind of 
attack and with the best strategy, which consists, of course, in choosing the a 
with the maximum probability. 

As for Hoo{A I B) and Ioo{A;B), the most interesting versions in terms 
of security seem to be those of Smith. In fact, in this thesis we adopt his 
approach to information leakage, and we will, from now on, use the following 
notation: 

• Hoc{A\ B) stands for H^'^^{A \ B) and is referred to as conditional 
min-entropy] 

• /oo(^; B) stands for /^**'^(A; B) and is referred to as min-entropy leak- 
age. 

In fact, the conditional min-entropy Hoo{A \ B) represents the log of the 
inverse of the (expected value of the) probability that the same kind of adver- 
sary succeeds in guessing the value of ^4 a posteriori., i.e. after observing the 
result of B. The complement of this probability is also known as probability 
of error or Bayes risk. Since in general B and A are correlated, observing 
B increases the probability of success. In fact, we can prove formally that 
Hoo{A I B) < Hoo{A), with equality if A and B are independent. The min- 
entropy leakage Ioo{A;B) corresponds to the ratio between the probabilities 
of success a priori and a posteriori, which is a natural notion of leakage. Here 
Iqo{A; B) is in the format of (3.1), but the difference becomes a ratio due to the 
presence of the logarithms. Note that /oo(^; B) > 0, which seems desirable for 
a good notion of leakage. It has been proven in [BCP09] that Coo is obtained 
at the uniform distribution, and that it is equal to the sum of the maxima of 
each column in the channel matrix, i.e. Coo = "^beB^^^^a^APib I (^)- 

The definition of I^{A;B) in (3.6) has also an interpretation in security: 
it represents the maximum gain in the probability of success, i.e. the max- 
imum ratio between the a posteriori and the a priori probability. Note that 
also I^{A; B) is always non-negative and it is if and only if A and B are in- 
dependent. More generally, Dxiip \\ q) and its a-extension Da{p \\ q) should 
represent the "inefficiency" of an adversary who bases its strategy on the dis- 
tribution q, when in fact the real distribution is p. Hence I*{A;B) defined as 
Da{A A B II A ■ B) should represent the gain of the adversary in revising his 
strategy according to the knowledge of the correlation between A and B. 

Concerning ffCachm ^^^^ jCachtn^ they have some nice properties. For in- 
stance they enjoy weak versions of the chain rule (3.5). More precisely, the 
"=" in (3.5) becomes ">" for a < 1, and "<" for a > 1. There is no general 
relation between Hg'"'^'''{A \ B) and H^{A), and in particular I^^^in -g 
guaranteed to be non-negative. 
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3.3.3 Guessing entropy 

The notion of guessing entropy was introduced by Massey in [Mas94]. Let 
us assume, for simplicity, that the elements of A are ordered by decreasing 
probabilities, i.e. if 1 < i < j < n then p{ai) > p{aj). Then the guessing 
entropy is defined as follows: 

Hg{A) = ^P^^i) 

l<i<\A\ 

Massey did not define the notion of conditional guessing entropy. In some 
works, like [Cac97, KB07], it is defined analogously to (3.2): 

Hg{A\B) = Y,p{h) HG{A\B = h) 

Meaning in security Guessing entropy represents an adversary who is al- 
lowed to ask repeatedly questions of the form "is A = a?". More precisely, 
Hg{A) represents the expected number of questions that the adversary needs 
to ask to determine the value of A, assuming that he follows the best strategy, 
which consists, of course, in choosing the a's in order of decreasing probability. 

Hg{A I B) represents the expected number of questions a posteriori, i.e. 
after observing the value of B and reordering the queries according to the 
updated probabilities (i.e. the queries will be chosen in order of decreasing a 
posteriori probabilities). 

Also in this case, Hg{A \ B) is not necessarily smaller than or equal to 
Hg{A), so the corresponding notion of mutual information is not guaranteed 
to be non-negative^. 

3.3.4 Marginal guesswork 

The marginal guesswork is a variant of guessing entropy that was proposed 
by Pliam [PliOO]. It is parametric in a number r/ > 0, and is defined as 
follows. Again, we assume that the elements of A are ordered by decreasing 
probabilities. 

Hr,{A) = min{j | ^ p{ai) > rj} 

l<i<j 

Pliam did not define the conditional version of marginal guesswork, but in 
[KB07] it is defined following (3.2): 

H^{A\B) = Y,p{b) H^{A\B = h) 

beB 

^This problem is inherent to the probabiHstic case, and therefore it does not occur in 
[KB07], since that work considers only deterministic systems. 
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Meaning in security Consider again an adversary who is allowed to ask 
repeatedly questions of the form "is A = a?". Hjj{A) represents the minimum 
number of questions that the adversary needs to ask to determine the value of 
A with probability at least r]. 

H.ri{A I B) represents the same notion, but using the a posteriori probabil- 
ities. Again, it is not necessarily the case that Hjj{A \ B) < Hrj{A). 

3.3.5 Comparison and discussion 

The various notions of entropy discussed in this section have been carefully 
compared with Shannon entropy, to conclude that in general there is no tight 
relation. Fano's inequality gives a lower bound to the Bayes risk in terms of 
(conditional) Shannon entropy, and Renyi [R61], Hellman-Raviv [HR07], and 
Santhi-Vardi [SV06] give upper bounds as well, but all these are rather weak. 
Smith has shown in [Smi09] that the orderings induced on channels by the 
Bayes risk and by Shannon entropy are in general unrelated. 

Massey has shown that the exponential of the Shannon entropy is a lower 
bound for the guessing entropy, and that, in case of a geometric distribution, 
the bound is tight. Massey has also shown that in the general case the Shannon 
entropy can be arbitrarily close to while the guessing entropy is constant 
[Mas94]. 

As for the marginal guesswork. Pliam has shown that it is essentially 
unrelated with Shannon entropy [PliOO]. 

In this thesis we focus on the concepts of leakage based on Shannon entropy 
(Chapter 4) and min-entropy (Chapter 5). 
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Information flow in interactive systems 



"True interactivity is not about clicking on icons or downloading files, 

it's about encouraging communication." 

Edwin Schlossberg 

The key idea behind the information-theoretic approaches to information flow 
is to interpret the system as an information-theoretic channel, where the secrets 
are the input and the observables are the output. The channel matrix consists 
of the conditional probabilities p{b \ a), defined as the measure of the executions 
producing the observable b, relative to those which contain the secret a. The 
leakage is represented by the mutual information, and the worst-case leakage 
by the capacity of the channel (see Chapter 3 for reference) . 

In information theory, however, there are several different models of chan- 
nels. So far the works in the literature about information theory applied to 
information flow have focused on the simplest kind of channels: discrete memo- 
ryless channels where the absence of feedback is implicitly assumed. This clas- 
sical approach has been successfully used in scenarios where the secret value 
is assumed to be chosen at the beginning of the computation. In this chapter, 
however, we are interested in the more general scenario in which secrets can 
be chosen at any point. More precisely, we consider interactive systems, i.e. 
systems in which the generation of secrets and the occurrence of observables 
can alternate during the computation and influence each other. Examples of 
interactive systems include auction protocols like [Vic61, Sub98, SA99]. Some 
of these have become very popular thanks to their integration in Internet-based 
electronic commerce platforms [Eba, Ebi, Mer]. Other examples of interactive 
programs include web servers, GUI applications, and command-line programs 
[BPS+09]. 

Unfortunately, the information-theoretic approach which interprets inter- 
active systems as classical channels is not valid. More specifically, in such 
systems the channel matrix is not invariant with respect to the input distri- 
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bution, so the channel capacity cannot be calculated in the traditional way. 
Therefore, the notion of maximum leakage as standard capacity is also com- 
promised. 

The goal of this chapter is to extend the classical information-theoretic 
approach to information flow to the more complicated scenario of interactive 
systems. 

Contribution The main contributions of this chapter can be summarized 
as follows. 

• We show that by considering the richer channels that support memory 
and feedback it is possible to retrieve the correspondence between sys- 
tems and channels. We prove that there is a complete correspondence 
between interactive systems and channels with memory and feedback, 
and we show how to model the latter as the former. 

• We propose the use of directed information, as opposed to mutual in- 
formation, to represent leakage in interactive systems. Recent results in 
information theory [TM09] have shown that, in channels with memory 
and feedback, the transmission rate does not correspond to the maxi- 
mum mutual information (the standard notion of capacity) , but rather to 
the maximum normalized directed information, a concept introduced by 
Massey [Mas90]. We argue that in interactive channels the real leakage is 
due to the directed information from secrets to observables, whereas the 
directed information from observables to secrets (corresponding to feed- 
back) is a characteristic of the system itself and should not be counted 
as leakage. 

• We show that our model is a proper extension of the classical one, i.e. in 
the absence of interactivity the model of channels with memory and feed- 
back collapses into the model of memoryless channels without feedback. 
Moreover, in that case also the concepts of mutual information and di- 
rected information from input to output coincide, the same holds for the 
concepts of capacity and directed capacity. We argue that in the clas- 
sical approach mutual information is a good measure of leakage exactly 
because of this property: in the absence of feedback mutual information 
and directed information from input to output are the same. 

• We show that the capacity of the channels associated to interactive sys- 
tems is a continuous function with respect to a pseudometric based on 
the Kantorovich metric. The continuity of the channel capacity was also 
proved in [DJGP02] for simple channels, but the proof does not adapt 
to the case of channels with memory and feedback and we had to devise 
a different technique. 

Plan of the Chapter This chapter is organized as follows. In Section 4.1 
we introduce the concept of interactive systems and we show why channels 



44 



4.1. Interactive systems 



without memory and feedback are inadequate in this scenario. In Section 4.2 
we review the notion of channels with memory and feedback, which is the core 
of the model we propose. We discuss the concept of directed information and 
also the concept of capacity in the presence of feedback. Section 4.3 contains 
the main contribution in this chapter: We explain how Interactive Informa- 
tion Hiding Systems (IIHSs) can be modeled using channels with memory and 
feedback. In particular we show that for any IIHS there is always a channel 
that simulates its probabilistic behavior. In Section 4.4 we discuss our no- 
tion of adversary and we define the quantification of information leakage as 
the channel's directed information from input to output, or as the directed 
capacity, depending on whether the input distribution is fixed or not. In Sec- 
tion 4.5 we apply our model to an example, the Cocaine Auction protocol. In 
Section 4.6 we propose a pseudometric structure on IIHSs based on the Kan- 
torovich metric. We also show that the capacity of the channels associated to 
interactive systems is a continuous function with respect to this pseudometric. 
In Section 4.7 we present some related work, and in Section 4.8 we review and 
discuss the main results of the chapter, and consider future work. 



4.1 Interactive systems 

In this section we exemplify the problems that arise when we try to apply 
the classical information-theoretic approach to interactive systems. In order 
to derive an information-theoretic channel, at a first glance it would seem 
natural to define the channel matrix by using the definition of p{b \ a) in terms 
of the joint and marginal probabilities p{a,b) and p{b). Namely, the entry 
p(b I a) would be defined as the measure of the traces with (secret, observable)- 
projection (a, b), divided by the measure of the traces with secret projection a. 
An approach of this kind was proposed in [DJGP02]. In the interactive case, 
however, this construction does not really produce an information-theoretic 
channel. In fact, by definition a channel should be invariant with respect to 
the input distribution, and this is not the case here, as shown by the following 
example. 

Example 1. Figure 4-i represents a web-based interaction between one seller 
and two possible buyers, rich and poor. The seller can offer two different 
products, cheap and expensive, with given probabilities. Once the product is 
offered, each buyer may try to buy it, with a certain probability. For simplicity 
we assume that the buyers' offers are mutually exclusive. We assume that the 
offers are observables, in the sense that they are made public on the website, 
while the identity of the buyer that actually buys the product should be kept 
secret from an external observer. The symbols r, qi, q2, r, qi, q2 represent 
probabilities, with the convention that r = 1 — r ( and the same for the pairs 
qi, gl and q2, q2)- 
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Figure 4.1: Interactive system of Example 1 



Following [DJGP02] we can compute the conditional probabilities using 
p{b\a) = thus obtaining the matrix in Table 4.1. The matrix however is 

not invariant with respect to the input distribution. For instance for r = r = ^, 
gi = |, and q2 = ^ we obtain the matrix in Table 4.2(a). If we change the 
input distribution, for instance by changing the value of (/2 to be |, also the 
matrix changes. We obtain, indeed, the new matrix illustrated in Table 4.2(b). 





cheap 


expensive 


poor 


rqi 


rq2 


rqi+rq2 


rqi +rq2 


rich 


rqi 


rtE 


rqi+rq2 


rq^+rq2 



Table 4.1: Channel matrix for Example 1 



Consequently, when the secrets occur after the observables and depend on 
them, we cannot consider the conditional probabilities (of the observables given 
the secrets) as representing a classical channel from secrets to observables, and 
we cannot apply the standard information-theoretic concepts. In particular, 
we cannot use "the capacity of the matrix" (defined by considering the matrix 
as a channel matrix, and taking the maximum mutual information over all 
possible inputs) because in general the maximum is given by a distribution 
different from the one that was used to define the matrix, hence the result 
would be unsound. 

The first contribution of this chapter is to consider an extension of the 
theory of channels which makes the information-theoretic approach applicable 
also in the case of interactive systems. A richer notion of channels, known in 
information theory as channels with memory and feedback, serves our purposes. 
The dependence of inputs on previous outputs corresponds to feedback, and 
the dependence of outputs on previous inputs and outputs corresponds to 
memory. Recent results in information theory [TM09] have shown that, in such 
channels, the transmission rate does not correspond to the maximum mutual 
information (the standard notion of capacity), but rather to the maximum 
normalized directed information, a concept introduced by Massey [Mas90]. 
We propose to adopt this latter notion to represent leakage. 
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cheap 


expensive 


Input distr. 


poor 


2 
3 


1 

3 


p{poor) = i 


rich 


1 
3 


2 
3 


p{rich) = i 


(a) r = = |,(j2 = 1 




cheap 


expensive 


Input distr. 


poor 


4 
5 


1 

5 


p{poor) = ^ 


rich 


2 
7 


5 
7 


p{rich) = ^ 



(b) r = = §,52 = i 

Table 4.2: Two different cliannel matrices induced by two different input dis- 
tributions for Example 1 

Our model of attacker is the interactive version of the attacker associated 
to Shannon entropy in the classification of Kopf and Basin [KB07], discussed 
in Chapter 3. In the case of a standard single- use channel, the invulnerability 
degree of the secret before the attacker observes the output is the entropy of the 
input, determined by its a priori distribution. The invulnerability degree after 
the attacker observes the output is the conditional entropy of the input given 
the output, determined by its a posteriori distribution. The latter is always 
smaller than or equal to the first. The difference between these invulnerability 
degrees corresponds to the mutual information, and represents the leakage of 
the system. In our interactive framework we consider the same scenario, but 
iterated. At each time step, we consider the input sequence so far; and the 
increase of its vulnerability caused by the observation of the new output is 
given by the contribution of the present step to the leakage. The sum of all 
these contributions represents the total leakage and, as we will see, corresponds 
to Massey's directed information. We will come back to the model of attacker 
in Section 4.4, and discuss also a variant of this interpretation. 

A second contribution of our work is the proof that the channel capacity 
is a continuous function of a pseudometric on interactive systems based on 
the Kantorovich metric. The reason why we are interested in the continu- 
ity of the capacity is for computability purposes. Given a function / from 
a (pseudo)metric space X to a (pseudo)metric space Y the continuity of / 
means that, given a sequence of objects rEi,X2, ... € A" converging to x ^ X, 
the sequence f{xi), f{x2), . . . G 3^ converges to f(x) € y. Hence f(x) can be 
approximated by the objects f{xi),f{x2), ■ ■ ■■ The typical use of this prop- 
erty is in the case of execution trees generated by programs containing loops. 
Generally the automaton expressing the semantics of the program can be seen 
as the (metric) limit of the sequence of trees generated by unfolding the loop 
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to an increasingly deeper level. The continuity of the capacity means that we 
can approximate the real capacity by the capacities of these trees. 

4.2 Discrete channels with memory and feedback 

In this section we present the notion of channel with memory and feedback. We 
assume a scenario in which the channel is used repeatedly, in a finite temporal 
sequence of steps 1, . . . , T. Intuitively, memory means that the output at time 
^(1 ^ t <T) depends on the input and output histories, i.e. on the inputs up 
to time t, and on the output up to time t — 1. Feedback means that the input 
at time t depends on the outputs up to time t — 1. 
We adopt the following notation. 

Convention 2. Given sets of symbols (alphabets) A = {ai,...,a^}, B = 
{h^, . . . , we use a Greek letter (a, /?, . . . ) to denote a sequence of symbols 
ordered in time. Given a sequence a = ai-^ai^ . . . Oj^, the notation represents 
the symbol at time t, i.e. ai^, while a* represents the sequence a^_^a^^ ...a^^. 
For instance, in the sequence a = a.j^aja^, we have = 0,7 0'''^^ c? = a-^a^. 
Analogously, if X is a random variable, then denotes the sequence of t 
consecutive instances Xi, . . . ,Xt of X. 

We now define formally the concepts of memory and feedback. Consider a 
channel from input A to output B. The channel behavior after T uses can be 
fully described by the joint distribution of x , namely by the probabilities 
p{a'^ , (3'^). Using the chain rule, we can decompose these probabilities as 
follows: 

T 

p{a^,(3^) = llp{at\a'"\(3'~')p{(3,\a\p'"') (4.1) 
t=i 

Definition 3. We say that a channel has feedback if, in general, 
p{a^\a^~^ , I3^~^) 7^ p{a^\a''~^), i.e. the probability of depends not only on 
a*""*^, but also on /?*~^. Analogously, we say that the channel has memory if, 
in general, p(/3Ja*, /3*~^) 7^ p{(3^\a^), i.e. the probability of (3^ depends on a* 
and 13*-^. 

Note that in the opposite case, i.e. when j>(aja*~"'^, Z?*""*^) coincides with 
p(Q;j|a*~^) and p(/3Ja*, /3*~^) coincides with p(/3j|aj), we have a classical chan- 
nel (memory less, and without feedback), in which each use is independent from 
the previous ones. The only possible dependency on the history is the one of 
at on a*~^. This is because j4i, . . . ,At are in general correlated, due to the 
fact that they are produced by an encoding function. Note that in absence of 
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memory and feedback (4.1) reduces to: 
T 

p{a^,f) = \{p{at\a^-^)p{P^\at) 
t=i 

T 

= p(a"^) JJp(/3Jaj) (by the chain rule) (4-2) 

t=i 

from which we can derive the standard formula for a classical channel after T 
uses. 

p{a^ ) 

T 

= l[p{(3,\a,) (by (4.2)) 

t=i 

So far we have given a very abstract description of a channel with memory 
and feedback. We now discuss a more concrete notion following the presen- 
tation of [TM09]. Such a channel, represented in Figure 4.2, consists of 
a sequence of components formally defined as a family of stochastic kernels 
{p{- |a*,/?*-^)}f=i over B. 

The probabilities p(/3^|a*, /3*~^) represent innermost behavior of the channel 
at time t, 1 < t < T: the internal channel takes the input and, depending 
on the history of inputs and outputs so far, it produces an output symbol /3j. 
The output is then fed back to the encoder with delay one. On the input side, 
at time t the encoder takes the message and the past output symbols /3*~^ and 
produces a channel input symbol according to the code function (p^ (we will 
explain this concept in the next paragraph). At final time T the decoder takes 
all the channel outputs and produces the decoded message W. The order 
in time is the following: 

Message W, ai, f3i, /32, aj-,(3']^, Decoded Message W 

Let us now explain the concept of code function. Intuitively, a code func- 
tion is a strategy to encode the message into a suitable representation to be 
transmitted through the channel. There is a code function for each possible 
message, and the functions are fixed at the very beginning of the transmission 
(time t = 0). The encoding, however, can use the information provided via 
feedback, so each component ip^ (1 < t < T) oi the code function takes as 
parameter the history of feedback to generate the next input symbol a^. 

Formally, let be the set of all measurable maps ip^ : B^~^ A en- 
dowed with a probability distribution, and let Ft be the corresponding ran- 
dom variable. Let , F'^ denote the Cartesian product on the domain and 
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w- 



Code- 
Fmictions 

T 



Time 



Encoder 

K = <p.(/3'-i)}Li 



Time 
t= 1...T 



Channel 



Delay \ 



Decoder 



Time T+1 



Figure 4.2: Model for discrete channel with memory and feedback 



'W 



the random variable, respectively. A channel code function is an element 



Note that, by the chain rule, "pi^p ) = nt=i^'(¥'ilv' )■ Hence the distri 



bution on is uniquely determined by a sequence \j>{}Pt\^'' 



t=i- 



The no- 



tation ip^{f3^~^) will represent the ^-valued t-tuple [(p-^, (Z?^), • • • , ^tW^~^))- 
In Information Theory this kind of channel is used to encode and transmit 
messages. If W is a set of messages of cardinality M with typical element w, 
endowed with a probability distribution, a channel code is a set of M channel 



code functions (p 



w 



interpreted as follows: for message w, if at time t the 



channel feedback is /3*~^, then the channel encoder outputs ipf[w]{/3^~^). A 
channel decoder is a map from to W which attempts to reconstruct the 
input message after observing all the output history from the channel. 



4.2.1 The power of feedback 

The original purpose of communication channel models is to represent data 
transmission from a source to a receiver. Shannon's Channel Coding Theo- 
rem states that for every channel there is an encoding scheme that allows a 
transmission rate arbitrarily close to the channel capacity with a negligible 
probability of error (if the number of uses of the channel is large enough). A 
general way to find an optimal encoding scheme that is also easy to decode 
has not been found yet. The use of feedback, however, can simplify the design 
of the encoder and of the decoder. The following example illustrates the idea. 








1 


e 





0.8 





0.2 


1 





0.8 


0.2 



Table 4.3: Channel matrix for binary erasure channel 



Example 2. Consider a discrete memoryless binary channel {A, B,p{.\.)} with 
A = {0, 1}, B = {0, 1, e} and the channel matrix of Table 4-3. This kind of 
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channel is called erasure channel because it can lose (or erasej bits during the 
transmission with a certain probability. Namely, any bit has 0.8 probability of 
being correctly transmitted, and 0.2 probability of being lost. On the output 
side the encoder is able to detect whether the bit was erased (by receiving an e 
symbol), but it cannot tell which was the actual value of the original bit. The 
Channel Coding Theorem guarantees that the maximum information transmis- 
sion rate in this channel is (2 to the power of) the channel capacity, i.e. 0.8 
bits per use of the channel. 

Following simple principles described in [CT06], an encoding that achieves 
the capacity can be easily obtained if the channel can be used with feedback. The 
idea is an adaptation of the stop-and-wait protocol [Sta06, Tan89]. Suppose 
that every bit received on the output end of the channel is fed back noiselessly to 
the source with delay 1. Define the encoding as follows: for each bit transmitted, 
the encoder checks via feedback whether the bit was erased. If not, the encoder 
moves on to transmit the text of the message. If yes, the encoder transmits the 
same bit again. 

It is easy to see that with this encoding scheme the transmission rate is 0.8 
bit per usage of the channel, since in 80% of the cases the bit is transmitted 
properly, and in 20% it is lost and a retransmission is needed. 



We now proceed to illustrate in more detail the design and the function of 
the encoder and decoder. 



An example illustrating the the encoder/decoder design 

We proceed with the erasure channel of Example 2 to show how the enriched 
model of channels with memory and feedback can be used to transmit the 
message, and in particular how the feedback can be used to design the encoder. 
We assume that the set W of possible messages consists of all finite sequences of 
bits. The role of the code functions is to encode the message W into a suitable 
representation for the stochastic kernels within the channel. The input and 
output alphabets for the stochastic kernels are A = {0, 1} and B = {0, 1, e}, 
respectively. We assume that at most T uses of the channel are allowed and 
we use with 1 < t < T, to represent the t^^ time step. 

We consider a sort of memory that depends only on the input history and 
we abstract from its specific form by defining a function rj : V{A^) — )• [0, 1] 
that maps each possible input history to a correction factor to be added to (or 
subtracted from) a base probability value. We compute the contribution of r] 
to the base values using arithmetic modulo 2, in such a way that the resulting 
values are still a probability distribution. More precisely, the stochastic kernels 



51 



4. Information flow in interactive systems 



are defined as follows. 





= 0|a*- 


"iO,/3*- 




= 0.8 - i]{a^- 




pWt 


= 1 a*- 


-^0,/3*- 




= 




Pipt 


= e a*~ 






= 0.2 + r/(a*" 




Pipt 


= Oja*- 






= 




Pipt 


= 1 a*- 






= 0.8 - 77(a*- 




pWt 


= e a*~ 






= 0.2 + r/(a*- 





(4.3) 



Correspondingly, the general form of the channel matrix for each time 
1 < t < T is shown in Table 4.4. 








1 


e 




0.8-r?(a*-i) 





0.2 + r/(a*-^) 







0.8 - 77(a*~i) 


0.2 + r/(a*"i) 



Table 4.4: General form of channel matrix 



The code functions are chosen at time i = 0, based on the message to be 
transmitted. For illustration purposes, let us suppose that the message is the 
sequence of three bits W = Oil. The other cases of W are analogous. 

At time t = 1, the channel is used for its first time and the feedback history 
so far is empty (3^ = e. The encoder selects the input symbol Uq = 0, as in 
(4.4). 

/i[VF = 011](/3° = e) =0 (4.4) 

At time t = 2, the feedback history consists of only one symbol, and in 
principle the possibilities are either = 0, = 1 or = e. In the first 
case, the first bit was successfully transmitted and the encoder can go on to 
the second bit of the message. By the way the channel is defined, the second 
case is not really possible, so it is not important how the reaction function is 
defined for this case. We will denote this indifference by attributing to the 
function the symbol x instead of a or a 1. In the last case, = e, the first 
bit was erased and the encoder tries to retransmit the bit 0. We can write it 
formally as below. 

f^[W = 011](/3i = 0) = 1 

/2[H^ = 011](/3i = l)=x (45) 
/2[H^ = 011](/3i =e) = 

At time t = 3 the feedback histories allowed by the channel are € 
{01, Oe, eO, ee} (the other ones have zero probability). In the first case, = 01 
the two first bits of the message have been transmitted correctly and the 
encoder can send the third bit. If = Oe, the transmission of the first bit 
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was successful, but the second bit was erased and needs to be resent. In the 
case = eO, the first bit was erased in the first try but was successfully 
transmitted in the second try, so now the encoder can move to the second bit 
of the message. In the last case, = ee, the two tries were unsuccessful and 
the encoder still needs to transmit the first bit of the message. Formally: 



f3[W 


= 011](/32 


= 00) 


= X 




= 011](/32 


= 01) 


= 1 




= 011](/32 


= Oe) 


= 1 




= 011](/32 


= 10) 


= X 




= 011](/32 
= 011](/32 
= 011](/32 


= 11) 


= X 




= le) 


= X 




= eO) 


= 1 




= 011](/32 


= el) 


= X 




= 011](/32 


= ee) 


= 



We can easily extend the construction of code functions for 3 < t < T 
using this encoding scheme. 

The decoder is very simple: once all time steps 1, . . . , T have taken place, 
it just takes the whole output trace and removes the occurrences of the 
erased bit symbol e in order to recover the original message. 

Table 4.5 shows a possible behavior of a binary erasure channel with mem- 
ory and feedback in a scenario where the message is = Oil and the channel 
can be used at most T = 3 times. Note that in this particular example the 
maximum number of uses of the channel is achieved before the whole mes- 
sage is successfully sent: the decoder can recover only the two first bits of the 
original message. 

We can observe that the channel capacity in the above example does not 
increase with the addition of feedback (it is 0.8 bit per usage of the channel with 
or without feedback). This is because the channel is memoryless: feedback does 
not increase the capacity of discrete memoryless channels [CT06]. In general 
however, feedback does increase the capacity of channels with memory. 

4.2.2 Directed information and capacity of channels with 
feedback 

In classical Information Theory, the channel capacity, which is related to the 
channel's transmission rate by Shannon's Channel Coding Theorem, can be 
obtained as the supremum of the mutual information over all possible input 
distributions. In the presence of feedback, however, this correspondence no 
longer holds. More specifically, mutual information no longer represents the 
information flow from to . Intuitively, this is due to the fact that mu- 
tual information expresses correlation, and therefore it is increased by feedback 
(Example 5 in Section 4.4 depicts this fact). Yet feedback, i.e. the way the 
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t = 1 


As in (4.4) 


e 


"i = 

f,lW = 011]ie) 
= 


According to 
P(/3l|0,e) 
produces 




t = 2 


As in (4.5) 


e 


"2 = 
/2[H' = 011](e) 
= 


.A-Ccording to 
p(/32|00,e) 
produces 




t = 3 


As in (4.6) 


eO 


"3 = 
/3[W = 011](eO) 
= 1 


According to 
p(^3|001,e0) 
produces 

03 = 1 




t = 4 










Decoded 
message W — 
-riP^ = eOl) 
= 01 











Table 4.5: A possible evolution of the binary channel with time, for W = Oil 
and T = 3 

output influences the next input, is not part of the information to be trans- 
mitted. If we want to maintain the correspondence between the transmission 
rate and capacity, we need to replace the mutual information with directed 
information [Mas90]. 

Definition 4. In a channel with feedback, the directed information from input 
to output is defined as 

T 

I{A^ ^ B'^) = Y,I{A'-Bt\B'-^) 
t=i 

In the other direction, the directed information from B^ to A^ is defined as 

t=i 

In Section 4.4 we will discuss the relation between directed information and 
mutual information, as well as the correspondence with information leakage. 
For the moment, we only present the extension of the concept of capacity. 

Let Vt = {p{a^\a^~^ , be the set of all input distributions in 

presence of feedback. For finite T, the capacity of a channel with memory and 
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feedback is: 

Ct = suY^h{A^ ^ B'^) (4.7) 

Vt ^ 

The capacity is also defined when T is infinite, see [TM09]. In this thesis, 
however, we only need to consider the finite case. 

4.3 Interactive systems as channels with memory 
and feedback 

Interactive Information Hiding Systems (IIHS) were introduced in [APvRSlO] 
to represent systems where secrets (inputs) and observables (outputs) can in- 
terleave and influence each other. They are a variant of probabilistic au- 
tomata in which actions are divided into secrets and observables. They can 
be of two kinds: fully probabilistic, and secret-nondeterministic (or input- 
nondeterministic) . In the former there is no nondeterminism, while in the 
latter every secret choice is fully nondeterministic. In this chapter we consider 
normalized IIHSs, in which secrets and observables alternate, and the actions 
at the first level are secrets. We note that this is not really a restriction, be- 
cause given an IIHS which is not normalized, it is always possible to transform 
it into a normalized IIHS which is equivalent to the former one up to a given 
execution level. The reader can find further below in this Section the formal 
definition of the transformation. Furthermore, we require that for each state 
s and each action £ there is at most one state that can be reached from s by 
performing an £ transition. 

In this section we formalize the notion of IIHS and we show how to associate 
to an IIHS a channel with memory and feedback. 

Definition 5. A (normalized) IIHS is a triple J = [M,A,B), where A and 
B are disjoint sets of secrets and observables respectively, M is a probabilistic 
automaton (S, C, s, ??) with C = AU B, and, for each s G S: 

1. either i?(s) C V{A x S) or 'd{s) C V{B x S). We call s a secret state in 
the first case, and an observable state in the second case; 

2. if s ^ r then: if s is a secret state then r is an observable state, and if 
s is an observable state then r is a secret state; 

3. s is a secret state; 

4- if s is an observable state then \'d{s)\ < 1 ; 
5. either: 

(i) for every secret state s we have \'d{s)\ < 1 (fully probabilistic IIHS), 
or 
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(a) for every secret state s there exist Oj and Si (i = 1, . . . ,n) such that 
= {5{ai, Si)}^^^, where 5{ai,Si) is the Dirac measure (secret- 
nondeterministic IIHS); 

6. for every state s and action i there exists a unique state r such that 



In the rest of the chapter we will omit the adjective "normalized" for sim- 
plicity. In the above definition, Conditions 1 and 2 imply that the IIHS is 
alternating between secrets and observables. Moreover, all the transitions 
between nodes at two consecutive depths have either secret actions only, or 
observable actions only. Condition 3 means that the first level contains secret 
actions. Condition 4 means that all observable transitions are fully probabilis- 
tic. Condition 5 means that either all secret transitions are fully probabilistic, 
either they are all fully nondeterministic. The term "nondeterministic" is jus- 
tified by the fact that the scheme of Condition 5ii represented in Figure 4.3(a), 
is equivalent to the one of Figure 4.3(b). 



Figure 4.3: Scheme of secret transitions for secret-nondeterministic IIHSs 

Note that we do not consider here internal nondeterminism which can 
arise from interleaving of concurrent processes. This means that we make 
a rather restricted use of probabilistic automata, but this is enough for our 
purposes. The nondeterminism generated by concurrency gives rise to a new 
set of problems (see for example [CPPOSa]) which are orthogonal to those 
considered in this chapter. 

Condition 6 means that the secret and observable actions determine the 
states. As a consequence, the actions are enough to retrieve the path. This is 
expressed by the following proposition: 

Proposition 6. Given an IIHS, consider two paths a and a' . If tracej({cr) = 
trace_A{a') and traceQ{a) = tracej3{a'), then a = a' . 

Proof. By induction on the length of the traces. The initial state of the au- 
tomaton is uniquely determined by the empty (secret and observable) traces. 
Assume now we are in a state s uniquely determined by secret and observable 



s ^ r. 




(a) Nondeterministic input using Dirac measures 



(b) Equivalent scheme 
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traces a and (3, respectively. If s makes a secret transition s A s', then by 
Condition 6 there is only one state s' reachable from s via an a-transition, 
and therefore s' is uniquely determined by the secret trace a' = aa and the 
observable trace /3. The case in which s makes an observable transition is 
similar. □ 

The normalization of IIHS trees 

In this section we will address the problem of normalizing an IIHS, namely 
transforming it into a stratified automaton in which secret and observable 
actions alternate level by level. The process of normalization described bellow 
is general enough to be applied to any IIHS without loss of generality or 
expressive power. 

Let A and B represent the secret and observable actions, respectively. Con- 
sider a general IIHS J = (M, A, B) with M = (Q, s, ??), where C = AUB. 
Assume that we are only interested in executions that involve up to T interac- 
tions, i.e. T uses of the system, with one secret taking place and one observable 
produced at each time. 

In the normalization process, we unfold the automaton up to level 2T, since 
there is one secret symbol and one observable symbol for each step. We also 
extend the secret alphabet A with a new symbol a^, ^ A and the observable 
alphabet B with a new symbol ^ B. These new symbols will be used as 
placeholders when we need to re-balance the tree. Let A' = A^J {a*} and 
B' = BU{K}. 

For a given level t let labels (J, t) be the set of all labels of transitions that 
can be performed with a non-zero probability from the states at the t^^ level 
of the automaton. Formally: 

labels{3,t) = {£ G C \ 3a, s . \a\ = t, last{a) — )• s} 

The normalization of the IIHS 3 leads to an equivalent IIHS J' = (M', A', B'), 
where M' = {Q' , C, s', and C = A' U B'- and such that, for every 1 < t < 
2T: 

1. lahels{y ,t) A' or labels {3' ,t) C B'; 

2. labels{3',t) C A' if and only if labels{3' ,t+l) C B', for 1 < t < T-l; 

3. labels{3'A) C A'; 

Condition 1 states that each level consists of either the secret actions only, 
or the observable actions only. Condition 2 states that secret and observable 
levels alternate. Condition 3 says that the automaton starts with a secret level. 

The proof is straightforward. First, the new symbols and 6^ are place- 
holders for the absence of a secret and observable symbol, respectively. If in 
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a given level t we want to have only secret symbols, we can postpone the oc- 
currences of observable symbols at this level as follows: add to the secret 
level and "move" all the observable symbols to the subtree of a^. Figure 4.4 
exemplifies the local transformations we need to make on the tree. 




(a) Local nodes of the tree before the trans- (b) Local nodes of the tree after 
formation the transformation 



Figure 4.4: Local transformation in an IIHS tree 

Note that in 4.4(b) the introduction of new nodes changed the probabilities 
of the transitions in the tree. In general, whenever we need to introduce in 
order to postpone the observable symbols, the probabilities change as follows: 

1. For every a^, 1 < i < n, the associated probability is maintained as 

2. The probability of the new symbol is introduced as = X^fcLoPbj,! 

3. If Pa^ 7^ 0, then for 1 < i < m, the associated probability of bj is updated 
to Pft, = PbJPa, = PbJTJk=oPb,.- If Pa, = 0, then pj^ = 0, for 1 < i < m, 
and pb^ = 1. 

The subtrees of each node of the original tree are preserved as they are, 
until we apply the same transformation to them. If a node does not have a 
subtree (i.e. no descendants), we create a subtree by adding all the possible 
actions in B with probability 0, and the action 6^ with probability 1. 

If we are normalizing an observable level, the same rules apply, guarding 
the proper symmetry between secrets and observables. We then proceed in 
the same way on the deeper levels of the tree. Figure 4.5 shows an example of 
a full transformation on a tree (for the sake of readability, we omit the levels 
where only = 1 or 6^ = 1). 

4.3.1 Construction of the channel associated to an IIHS 

We now show how to associate a channel to an IIHS. 

In an interactive system secrets and observables may interleave and influ- 
ence each other. Considering a channel with memory and feedback is a way 
to capture this rich behavior. Secrets have a causal influence on observables 
via the channel, and, in the presence of interactivity, observables have a causal 
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influence on secrets via feedback. This alternating mutual influence between 
secrets and observables can be modeled by repeated uses of the channel. Each 
time the channel is used it represents a different state of the computation, and 
the conditional probabilities of observables on secrets can depend on this state. 
The addition of memory to the model allows expressing the dependency of the 
channel matrix on such a state. 

We will see that a secret-nondeterministic IIHS determines a channel as 
specified by its stochastic kernels, while a fully probabilistic IIHS determines, 
additionally, the input distribution. 

In Section 4.5 we will give an extensive and detailed example of how to 
make such a construction for an actual security protocol. 

Given a path a of length 2t — 1, we will denote tracej[{a) by a*, and 
tracei3{a) by /3*~-^. 

Definition 7. Let 3 be an IIHS. For each t, the channel's stochastic kernel 
corresponding to 3 is defined as p(/3^|a*, /3*~^) = i?(s)(/3j, s'), where s is the 
state reached from the root via the path a whose secret and observable traces 
are a* and j3^~^ respectively. 

Note that s and s' in the previous definition are well defined: by Proposi- 
tion 6, s is unique, and since the choice of is fully probabilistic, s' is also 
unique. 

The following example illustrates how to apply Definition 7, with the help 
of Proposition 6, to build the channel matrix of a simple example. 

Example 3. Let us consider an extended version of the website interactive sys- 
tem of Figure 4-F We maintain the general definition of the system, i.e. there 
are two possible buyers (rich and poor, represented by re. andpr., respectively) 
and two possible products (cheap and expensive, represented by chp. and exp., 
respectively). We still assume that offers are observable, since they are visible 
to everyone on the website, but the identity of buyers should be kept secret. We 
consider two consecutive rounds of offers and buys, which implies that, after 
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normalization, T = 3. Figure 4-6 shows an automaton for this example in 
normalized form. Transitions with null probability are omitted, and the symbol 
is used as a place holder to achieve the normalized IIHS. 
To construct the stochastic kernels {j>(/3Ja*, we need to deter- 
mine the conditional probability of an observable at time t given the history up 
to time t. 

Let us take the case t = 2 and compute the conditional probability of the 
observable /Jg = cheap given that the history of secrets up to time t = 2 is 
o? = a^,poor and the history of observables is {3^ = expensive. Applying 
Definition 7, we see that p{j32 = cheap\a'^ = a^,poor, f]^ = expensive) = 
(s) {cheap, s'). By Proposition 6, the traces o? = a^,poor, = expensive 
determine a unique state s in the automaton, namely, the state s = 5. More- 
over, from the state 5 a unique transition labeled with the action cheap is 
possible, leading to the state s' = 11. Therefore, we can conclude that p{(32 — 
cheap\a'^ = a^,poor,j3^ = expensive) = ^{s = 5){cheap, s' = 11) = P23. 

Similarly, with t = 1 and history = a^, (3^ = e, the observable sym- 
bol = expensive can be observed with probability p{f3i = expensive\a^ = 
, /30 = e) = i9(s = 0) {cheap, s' = 2) = pi. 

If J is fully probabilistic, then it determines also the input distribution and 
the dependency of on /3*~^ (feedback) and on a*~^. 

Definition 8. Let J be an IIHS. If J is fully probabilistic, the associated chan- 
nel has a conditional input distribution for each t defined as p{a^\a^~^, /3*~^) = 
'&{s){a^, s'), where s is the state reached from the root via the path a whose se- 
cret and observable traces are a^~^ and f3^~^ respectively. 

Example 4. Since the system of Example 3 is fully probabilistic, we can cal- 
culate the values of the conditional probabilities {p(cKf /3*~"^)}^i- 

Let us take, for instance, the case where t = 2 and compute the conditional 
probability of secret = poor given that the history of secrets up to time t = 2 
is = and the history of observables is = expensive. Applying Defini- 
tion 8, we see that p{a2 = poor\ai = a^, = expensive) = ■d{s){poor,s'). By 
Proposition 6, the traces = a^, fi"^ = expensive determine a unique state s 
in the automaton, namely, the state s = 2. Moreover, from the state 2 a unique 
transition labeled with the action poor is possible, leading to the state s' = 5. 
Therefore, we can conclude that p{a2 = poor\ai = a^,f3^ = expensive) = 
'd{s = 2){poor, s' = 5) = qi2- 

Similarly, with t = 3 and history = a^,rich,f? = cheap, expensive, 
the secret symbol = rich can be observed with probability p{a^ = rich\a^ = 
a^,rich, = cheap, expensive) = = 10){cheap, s' = 22) = ^24- 
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Figure 4.6: The normahzed IIHS for the extended website example 



4.3.2 Lifting the channel inputs to reaction functions 

Taken together, Definitions 7 and 8 show how to obtain the the joint probabil- 
ities p(a*,/3*) for a fully probabilistic IIHS. We still need to show, however, in 
what sense this joint probability distribution defines an information-theoretic 
channel. 

The {p{/3^\a^ , /3^~^)}]Li determined by the IIHS trivially correspond to a 
channel's stochastic kernel. The problem resides in the conditional probabili- 
ties {j>(a^|a*~^, /3*~^)}^^. In an information-theoretic channel, the value of 
is determined in the encoder by a deterministic function ip^{/3^~^). Therefore, 
inside the encoder there is no possibility for a probabilistic description of a^. 
The solution is to externalize this probabilistic behavior to the code functions. 

As shown in [TM09] , the original channel with feedback from input symbols 
A'^ to output symbols can be lifted to an equivalent channel without feed- 
back from code functions J-'^ to output symbols B^. This transformation also 
allows us to calculate the channel capacity. Let {p{ipf\(p^~^)}f^i be a sequence 
of code function stochastic kernels and let {p(/3^|a*, be a channel 
with memory and feedback. The channel from F'^ to is constructed using 
a joint measure Q{(p^ , ff) that respects the following constraints: 

Definition 9. A measure Q{ip^ , (3'^) is said to be consistent with re- 
spect to the code function stochastic kernels {p{ip^\ip^~^)}f^^ and the channel 
M/3,|a*,/3*-i)}f^i tf,for eacht: 

1. There is no feedback to the code functions: 
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2. The input is a function of the past outputs: 

Q(aJ(/5*,a*~\/3*~^) = 5{^j/3i-i)}(af) 
where 5 is the Dirac measure; 

3. The properties of the underlying channel are preserved: 

The following result states that there is only one consistent measure 

Theorem 10 ([TM09]). Given the probability distributions {p{(p^\ip^~^)}f^i 
and a channel defined by {p(/3Ja*, /3*~^)}^]^, there exists only one consistent 
measure Q{ip'^ ,0^ , P'^). Furthermore the channel from J-^ to is given by: 

g(/5J(/.*, =p(/3,|v^*(/5*-i),/?*-i) 

Since in our setting the concept of encoder makes little sense as there is 
no information to encode, we externalize the probabilistic behavior of as 
follows. Code functions become a single set of reaction functions {ip^}]L^ with 
/3*~^ as parameter (the message w does not play a role any more). Reaction 
functions can be seen as a model of how the environment reacts to given system 
outputs, producing new system inputs (they do not play a role of encoding a 
message). These reaction functions are endowed with a probability distribution 
that generates the probabilistic behavior of the values of a^. 

Definition 11. A reactor is a distribution on reaction functions, i.e. a se- 
quence of stochastic kernels {p{ip^\ip^~^)}f^i. A reactor R is consistent with a 
fully probabilistic IIHS X if it induces the compatible distribution Q{ip'^ , 
such that, for every 1 <t <T, (5(a^|a*~^, /3*~^) = p(aj|a*~^, /3*~^), where the 
latter is the probability distribution induced by J. 

The main result of this section states that for any fully probabilistic IIHS 
there is a reactor that generates the probabilistic behavior of the IIHS. Before 
moving to this result, we need to introduce a lemma. 

Lemma 12. Let X,y be non-empty finite sets, and let x X,y y. Let 

p : X X y —?■ [0, 1] be a function such that, for every x ^ X , we have: 
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Proof. By induction on the number of elements of X. 

Base case: X = {x}. In this case: 

XI Y{p{xJ{x)) =p{x,f{x)) =p{x,y) 



Inductive case: Let X = X' \J {x}, with x ^ X' and x ^ X' . Then: 

X n f'(^'/(^))= (by distributivity) 

( \ 



p{x,g{x)) = (by the assumption) 



E n P(^^f(^)) = hyp.) 
p{x,y) 

□ 



Theorem 13. Let 3 be a fully probabilistic IIHS inducing the joint probability 
distribution p(a'',f3^), I < t < T , on secret and observable traces. It is always 
possible to construct a channel with memory and feedback, and an associated 
probability distribution Q{ip^ , , (3'^), which corresponds to J in the sense that, 
for every I <t<T, a*, the equality Q{a^,j3^) =p{a^,j3*') holds. 

Proof. First note that, by laws of probability, Q{a^,f]^) = ^^^t (5(99*, a*, /3*). 
So we need to show that ^ * Q(ip^,a^, (3^) = p{a^,(3^) by induction on t. 



Base case: t = 1. Let us define Q{ipi\e) = p{ipi{e)) and Q{l3^\a^,e) 
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p{l3-^^\ai). Then: 



^Q(v3i,ai,/3i) 



g(/3i|y?i,ai,e)) 
^Q(y?i|e)(5{^^(,)}(ai)Q(/3i|a\e) 



(by the chain rule) 



(by Definition 9) 
(by construction of Q) 

(by definition of 6) 



P("i,/3i) 



Inductive case: Let us define Q(/3j|a*,/5* ) =p(/5Ja*,/5* ), and 

Note that, if we consider X = \ 13^ e B,l < i < t - 1}, y = A, 

and p(/3*~^,aj = p(aj99*"-'^(/3*^^), /3*^^), then Af, y and p satisfy the 
hypothesis of Lemma 12. 



Then: 



(5(v*, «*, /?*) = (by the chain rule) 
j;(Q(v,*-\a*-\/3*-^). 



g(<^,|V9*-\a*-i,/3*-i). 
Q(a,|(^*,a*-S/3*-i).Q(/3,|<^*,a*,/3*"^)) 
5;(Q(99*-\a*-\/3*-i).g(v9,|^*-^) 



'^M{/3*-i)}K)-Q(/3J«*,/3*-^) 
j;(Q(^*-\a*-\/3*-^; 



(by Definition 9) 



(by constr. of Q) 
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J2 {Q{^''\a'-\P''') 



(by definition of 5) 



¥,,(/3'-l)=a, 



p(/3,|a*,/3*-^)) = 
J^(Q((/.*-\a*-\/5*-iM/5,|a*,/?*-i) 

E n ) = (by Lemma 12) 

J](Q((^*-i,a*-\/3*-i)-p(/3,|a*,/3*-i)- 

p(aJa*-\/3*-i)) = 
KA|a*,/3*-')-p(at|a*-\/3*-i). 

Q(</^*"\«*^\ = (by ind. hyp.) 

;3(/3t|a*,/3*-^) •p(aJa*~\/3*-^) •p(a*-\/3*-^) = (by the chain rule) 

p(a*,/3*) 



□ 

Corollary 14. Lei J 6e a fully probabilistic IIHS. Let {p{P^\a'^ , P*~^)}f^^ be 
a sequence of stochastic kernels and {p(aja*~^, Z?*""*^)}^]^ a sequence of input 
distributions defined by 3 according to Definitions 7 and 8. Then the reactor 
R = {p{'ft\'f^~^)}J=i compatible with respect to the J is given by: 

P{n)= p{a^\a^,l3^)=p{a^) (4.8) 

pi^t\v'-')= llpinif3'-')\v'-Hp'-'),P'-'), 2<t<T (4.9) 



Figure 4.7 depicts the model for IIHS. Note that, in relation to Figure 4.2, 
there are some simplifications: (1) no message W is needed; 2) the encoder 
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becomes an "interactor"; (3) the decoder is not used. At the beginning, a reac- 
tion function sequence Lp^ is chosen and then the channel is used T times. At 
each usage t, the interactor produces the next input symbol by applying the 
reaction function 99^ to the fed back output Then the channel produces 

an output /3j based on the stochastic kernel p(/3Jq!*, /3*~^). The output is then 
fed back to the encoder, which uses it for producing the next input. 



Channel fit 



Delay 



Figure 4.7: Channel with memory and feedback model for IIHS 



Reaction- 
Functions 



ft ' "Intcractor" 



We conclude this section by remarking on an intriguing coincidence: The 
notion of reaction function sequence 93^, on the IIHSs, corresponds to the 
notion of deterministic scheduler [Seg95]. In fact, each reaction function 99^ 
selects the next step, a^, on the basis of the and a*~"^ (generated by 99*"^), 
and a*~^ represent the path up to that state. 

4.4 Leakage in interactive systems 

In this section we propose a definition for the notion of leakage in interactive 
systems. We first argue that mutual information is not the correct notion, and 
we propose to replace it with the directed information instead. 

In the case of channels with memory and feedback, mutual information is 
defined as I{A'^;B'^) = H{A^) - H{A^\B'^), and it is still symmetric (i.e. 
I{A^]B'^) = I{B^]A^)). Since the roles of A^ and B'^ in I{A^]B'^) are 
interchangeable, this concept cannot capture causality^ in the sense that it does 
not imply that causes B^ , nor conversely. Mutual information expresses 
correlation between the sequences of random variables and B^ . 

Mathematically the mutual information I{A^; -B"^) for T uses of the chan- 
nel can be expressed with the help of the chain rule of (3.4) in the following 
way. 

I{A^;B^) = j2l{A'-,BAB'~^) 
t=i 

In the equation above, each term of the sum is the mutual information 
between the random variable Bt and the whole sequence of random variables 
A^ = Ai,...,At, given the history B^ ^. The equation emphasizes that at 
time 1 < t < T, even though only the inputs a* = ai,a2, ■ ■ ■ ,af- have been 
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fed to the channel, the whole sequence A^, including At-^i, ■ ■ ■ i^Tj has 

a statistical correlation with Bt- Indeed, in the presence of feedback, Bt may 
influence At+i,At+2, ...,At- 

In order to show how the concept of directed information contrasts with 
the above, let us recall its definition: 

I^A'^ ^ B^) =^I{A';Bt\B'-'). 
t=i 

t=i 

These notions capture the concept of causality, to which the definition of 
mutual information is indifferent. The correlation between inputs and outputs 
I{A'^;B'^) is split into the information I{A^ — )> B'^) that flows from input to 
output through the channel and the information I{B^ — ?> A^) that flows from 
output to the input via feedback. Note that the directed information is not 
symmetric: the flow from to B^ takes into account the correlation between 
A'' and Bt, while the flow from B^ to A^ takes into account the correlation 
between B^~^ and At- 

It was proved in [TM09] that 

I{A^; B^) = I{A^ ^ B^) + /(S^ ^ A^) (4.10) 

i.e. the mutual information is the sum of the directed information flow in 
both senses. Note that this formulation highlights the symmetry of mutual 
information from yet another perspective. 

Once we split mutual information into directed information in the two op- 
posite directions, it is important to understand the different roles that the 
information flow in each direction plays. I{A^ — )• B^) represents the system 
behavior: via the channel the information flows from inputs to outputs ac- 
cording to the specification of the system, modeled by the channel stochastic 
kernels. This flow represents the amount of information an attacker can gain 
from the inputs by observing the outputs, and we argue that this is the real 
information leakage. 

On the other hand, I[B^ A^) represents how the environment reacts to 
the system: given the system outputs, the environment produces new inputs. 
We argue that the information flow from outputs to inputs is independent of 
any particular system: it is a characteristic of the environment itself. Hence, 
if an attacker knows how the environment reacts to outputs (the probabilistic 
behavior of the reactions of the environment given the system outputs), this 
knowledge is part of the a priori knowledge of the adversary. As a further 
justification, observe that this is a natural extension of the classical approach, 
where the choice of secrets is seen as external to the system, i.e. determined by 
the environment. The probability distribution on the secrets constitutes the 
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a priori knowledge and does not count as leakage. In order to encompass the 
classical approach, in our extended model we should preserve this principle, 
and a natural way to do so is to consider the secret choices, at every stage of 
the computation, as external. Their probability distributions, which are now 
in general conditional probability distributions depending on the history of 
secrets and observables, should therefore be considered as part of the external 
knowledge, and not counted as leakage. 

The following example supports our claim that, in the presence of feedback, 
mutual information is not a correct notion of leakage. 

Example 5. Consider the discrete memoryless channel with secret alphabet 
A = {0^^,02} and observable alphabet B = {^1,62} whose matrix is represented 
in Table 4-6. 





K 


^2 




0.5 


0.5 


0-2 


0.5 


0.5 



Table 4.6: Channel matrix for Example 5 

Suppose that the channel is used with feedback, in such a way that, for all 
1 < t < T, we have a^^i = ai if (3^ = hi, and a^j^^ = 02 if Pi = It is 
easy to show that if T > 2 then I{A^;B^) 7^ 0. Yet there is no leakage from 
to , since the rows of the matrix are all equal. We have indeed that 
I{A^ — 7- B^) = 0, and the mutual information I{A'^; B^) is only due to the 
feedback information flow I{B^ 

Having in mind the above discussion, we now propose a notion of infor- 
mation flow based on our model. We follow the idea of defining leakage and 
maximum leakage using the concepts of mutual information and capacity, mak- 
ing the necessary adaptations. 

As discussed in Chapter 3, in the non-interactive case the definition of 
leakage as mutual information, for a single use of the channel, is 

I{A-B) = H{A) - H{A\B) 

(cfr. for instance [CPPOSa, KB07]). This amounts to viewing the leakage as 
the difference between the a priori invulnerability and the a posteriori one. As 
explained in Chapter 3, these correspond to H{A) and H{A\B)^ respectively. 
This corresponds to the model of an attacker based on Shannon entropy dis- 
cussed by Kopf and Basin in [KB07]. 

In the interactive case, we can extend this notion by considering the leakage 
at every step t as given by 

I{A-Bt\B^~^) = H{A^\B^'^) - H{A^\Bt,B^-^) 
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The notion of attack is the same modulo the fact that we consider all the 
input from the beginning up to step t, and the difference in its vulnerability 
induced by the observation of Bt (the output at step t), taking into account 
the observation history B^~^. It is then natural to consider as total leakage 
the summation of the contributions I{A*; Bt\B*^^) for all the steps t. This is 
exactly the notion of directed information (cfr. Definition 4): 

I{A^ ^ B^) = f2HA';Bt\B'-') 
t=i 

Definition 15. The information leakage of a fully probabilistic IIHS is de- 
fined as the directed information I{A^ — ?> B^) of the associated channel with 
memory and feedback. 

We now show an equivalent formulation of directed information that leads 
to a new interpretation in terms of an attack model. First we need the following 
lemma. 

Lemma 16. I{B^ ^ A^) = H{A^) - EJ=i H{At\A^-\ B*~^) 
Proof. 

T 

I{B^ A^) =^I{At;B^~^\A^~^) (by Definition 4) 

t=i 

T 

= Y,{H{At\A'-') 

t=i 

-H{At\A^-\B*-^)) (bydef. of mutual info.) 

T 

= H{A^) - ^H{At\A^~^,B^^^) (by the chain rule) 

t=i 



□ 

The next proposition points out the announced alternative formulation of 
directed information from input to output: 

Proposition 17. I{A^ B^) = Y^J^^ H{At\A^~^ , B^~^) - H{A^\B'^) 
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Proof. 

^ BT) = i[A^. B^) - I{B^ ^ A^) (by (4.10)) 

= IiA'';B^)-H{A^) 



T 



+ ^H{At\A*-\B^~'^) (by Lemma 16) 

= H{A'^)-H{A^\B^)-H{A^) 

T 

+ H{At\A^-\B^-'^) (by def. of mutual info.) 



t=i 

T 



Y,H{At\A'~\B'~^)-H{A^\B^ 



t=i 



□ 

We note that the term "^^,1=1 i?*~^) can be seen as the entropy 

of the reactor i.e. the entropy of the inputs, taking into account their 

dependency on the previous outputs. This brings us to an intriguing alternative 

interpretation of leakage. 

Remark 18. The leakage can he seen as the difference between the a priori 
invulnerability degree of the whole secret , assuming that the attacker knows 
the distribution of the reactor, and the a posteriori invulnerability degree, after 
the adversary has observed the whole output B^ . 

In Section 4.5 we give an extensive and detailed example of how to calculate 
the leakage for an actual security protocol. 

In the case of secret-nondeterministic IIHS, we have a stochastic kernel 
but no distribution on the reaction functions. In this case it seems natural to 
consider the worst leakage over all possible distributions on reaction functions. 
This is exactly the concept of capacity. 

Definition 19. The maximum leakage of a secret-nondeterministic IIHS is 
defined as the capacity Ct of the associated channel with memory and feedback 
(cfr. (VO). 

A comparison with the definition of Gray (cfr. [Gra91], Definition 5.3) is 
in order. As explained in the introduction, Gray's model is more complicated 
than ours, because it assumes that low and high variables are present at both 
ends of the channel. If we restrict the definition of Gray's capacity C"^ to our 
case, by eliminating the low input and the high output, we obtain the following 
formula: 

1 ^ 

= sup - V /(A*-i; i?i|i?*-i) (4.11) 

J t=l 
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By comparing (4.7), which is based on Definition 4, to (4.11), we can see 
that the only difference is that (4.11) considers the correlation between Bt and 
A^~^ instead of A^. This seems to be intentional (cfr. [Gra91], discussion after 
Definition 4.1). We are not sure why C*^ is defined in this way, our best guess 
is that the high values must be those of the previous time step in order to 
encompass the theory of McLean [McL90]. In any case. Gray's conjecture that 
Cji corresponds to the channel transmission rate does not hold. For instance, 
it is easy to see that for T = 1 we always have = 0, but there obviously 
are channels which can transmit a non-zero amount of information even with 
one single use. 

We conclude this section by showing that our approach to the notion of 
leakage generalizes the classical approach (based on mutual information) to 
the case of feedback. The idea is that, if a channel does not have feedback, 
then I{B'^ A'^) = and therefore I{A'^;B^) = I{A'^ B^). In our 
opinion, the fact that mutual information turns out to be a particular case of 
directed information helps to justify the former as a good measure of infor- 
mation flow, despite its symmetry: in channels without feedback it is a good 
measure because it coincides with directed information from input to output. 

Lemma 20. In absence of feedback, I{B^ — > AJ') = 

Proof. When feedback is not allowed, B^~^ and At are independent for every 
l<t<T. Then: 

T 

7(5^ ^ A^) = ^I{At]B^^^\A^~^) (by Definition 4) 

t=i 

T 

= Y,{H{At\A'^') 
t=i 

- i?(^t B*-^)) (by def. of mutual info.) 

T 

= Y,{H{At\A'-') 
t=i 

- i7(^t|A*~^)) (B*"^ and A* are independent) 
= 

□ 

Proposition 21. In absence of feedback, leakage can be equivalently defined 
as directed information or as mutual information. Similarly, in absence of 
feedback, the maximum leakage can be equivalently defined as directed capacity 
or as capacity. 

Proof. It follows directly from Lemma 20 and (4.10). □ 
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4.5 An example: the Cocaine Auction protocol 

In this section we show the apphcation of our approach to the Cocaine Auction 
Protocol [SA99]. The formahzation of this protocol in terms of IIHSs using our 
framework makes it possible to prove the claim in [SA99] suggesting that if the 
seller knows the identity of the bidders then the (strong) anonymity guaranties 
are no longer assured. 

Let us consider a scenario in which several mobsters are gathered around 
a table. An auction is about to be held in which one of them offers his next 
shipment of cocaine to the highest bidder. The seller describes the merchandise 
and proposes a starting price. The others then bid increasing amounts until 
there are no bids for, say, 30 consecutive seconds. At that point the seller 
declares the auction closed and arranges a secret appointment with the winner 
to deliver the goods. 

The basic protocol is fairly simple and is organized as a succession of rounds 
of bidding. Round i starts with the seller announcing the bid price hi for that 
round. Buyers have t seconds to make an offer (i.e. to say yes, meaning "I'm 
willing to buy at the current bid price 6j") . As soon as one buyer anonymously 
says yes, he becomes the winner Wi of that round and a new round begins. If 
nobody says anything for t seconds, round i is concluded by timeout and the 
auction is won by the winner j_i of the previous round, if one exists. If the 
timeout occurs during round 0, this means that nobody made any offers at the 
initial price so there is no sale. 

Although our framework allows the formalization of this protocol for an 
arbitrary number of bidders and bidding rounds, for illustration purposes we 
will consider the case of two bidders {Candlemaker and Scarf ace) and two 
rounds of bids. Furthermore, we assume that the initial bid is always 100 
euros, so the first bid does not need to be announced by the seller. In each 
turn the seller can choose how much he wants to increase the current bid 
value. This is done by adding an increment to the last bid. There are two 
options of increments, namely inci (100 euros) and inc2 (200 euros). In that 
way, is either hi + mci or hi + inc2- We can describe this protocol as 
a normalized IIHS X = (Af, i3), where A = {Candlemaker, Scarface, a*} is 
the set of secret actions, B = {inci,inc2,b^} is the set of observable actions, 
and the probabilistic automaton M is represented in Figure 4.8. For clarity 
reasons, transitions with probability are not represented in the automaton. 
Note that the special secret action represents the situation where neither 
Candlemaker nor Scarface bid. The special observable action b^ represents 
the end of the auction and it can only occur if no one has bid in the round. 

Table 4.7 shows all the stochastic kernels for this example. 

The next step is to construct all possible reaction functions {y3j(/5*~^)}^]^. 
As seen in Section 4.3.2, the reaction functions correspond to the encoder in 
the channel. They take the feedback story and decide how the world will react 
to this situation. Table 4.8 contains the reaction functions for each time t < 2. 
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"l ^ ^1 


inci 


inc2 


K 


Candlemaker 




95 





Scarface 




97 





a* 








1 



(a) t^l,piP,\a\p") 



ai,/3i,a2 ^ ^2 


inci 


inc2 




Candlemaker, inci , Candlemaker 


922 


923 





Candlemaker, inci, Scarf ace 


924 


925 





Candlemaker, inci 








1 


Candlemaker, inc2 , Candlemaker 


q-n 


928 





Candlemaker,inc2, Scarface 


929 


930 





Candlemaker, inc2 








1 


Scarface,inci , Candlemaker 


932 


933 





Scarface,inci, Scarf ace 


934 


935 





Scarf ace,inci 








1 


Scarf ace,inc2 , Candlemaker 


937 


938 





Scarf ace,inc2 , Scarface 


939 


940 





Scarf ace,inc2 








1 










1 


All other lines 








1 



(b) t^2,p{p^\a^,l3') 
Table 4.7: Stochastic kernels for the Cocaine Auction example 



Now we need to define the reactor, i.e. the probability distribution on 
reaction functions. Corollary 14 shows that we can do so by using the following 
equations: 

Pi'Pi) =?'(ai|a°,/5°) =p(ai) 
Pi^tl^'-') = n Pi^tif^'~')\^'~HP'-'),f3'-'), 2<t<T 



For instance, p{f^i-^) = p{Candlemaker) = pi- In the same way, p(/i(2)) 
p{Scarface) = p2 and p{f^3)) = p{aj = ps- 
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/3° 


A(i) 


h(2) 


fu3) 





Candlemaker 


Scarface 


a* 



(a) All 3 reaction functions ipi 







/2(2)(/3') 


/2(3)(/3') 


/2(4)(/3') 


inci 


Candlemaker 


Candlemaker 


Candlemaker 


Candlemaker 


inc2 


Candlemaker 


Candlemaker 


Candlemaker 


Scarface 


K 


Candlemaker 


Scarface 


a* 


Candlemaker 




/2(5)(/3') 


/2(6)(/3') 


/2(7)(/3') 


/2(8)(/3') 


inci 


Candlemaker 


Candlemaker 


Candlemaker 


Candlemaker 


inc2 


Scarface 


Scarface 


a. 


a* 


K 


Scarface 


a* 


Candlemaker 


Scarface 




/2(9)(/3') 


/2(10)(/3') 


/2(ll)(/3') 


12(12)^ 


inci 


Candlemaker 


Scarface 


Scarface 


Scarface 


mc2 


a* 


Candlemaker 


Candlemaker 


Candlemaker 


K 


a* 


Candlemaker 


Scarface 


a. 




/2(13)(^^) 


/2(14)(/?') 


/2(15l(/3') 


/2{16)('^^) 


inci 


Scarface 


Scarface 


Scarface 


Scarface 


mc2 


Scarface 


Scarface 


Scarface 


a* 


K 


Candlemaker 


Scarface 


a» 


Candlemaker 




/2(17)(^') 


/2(18) (^^) 


/2(19)(/^^) 


4(20) (^^) 


inci 


Scarface 


Scarface 


a* 


a* 


inc2 




a* 


Candlemaker 


Candlemaker 


K 


Scarface 


a* 


Candlemaker 


Scarface 




/2(21)(/3') 




/2(23)(/^^) 


4(24) (/^^) 


inci 


a* 


a* 


a* 


a* 


mc2 


Candlemaker 


Scarface 


Scarface 


Scarface 


K 


a* 


Candlemaker 


Scarface 


a* 


P' 


/2(25l(/?') 


/2{26)('^^) 


•4(27) (^^) 




inci 


a* 


a* 


a* 




inc2 




a* 


a* 




K 


Candlemaker 


Scarface 


a* 





(b) All 27 reaction functions 'y52(/3"'^) 
Table 4.8: Reaction functions for the cocaine auction example 
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Let us take as an example the calculation of p(/2(6) l/i(i))- 

P(/2(6)l/l(l))=n^'(/2(6)(/5')l^l(l),/5') 

= p{f2(^Q^{inci)\Candlemaker, inci)- 

pi f 2(6) {inc2)\Candlemaker , inc2)- 

pU2{&) (^*) I Candlemaker, 6 J 
= p(Candlemaker\Candlemaker , inci)- 

p{Scarface \ Candlemaker , inc2) 

p{a^ I Candlemaker , b^) 
= P9 • • 1 

Note that some reaction functions can have probability 0, which is consis- 
tent with the probabilistic automaton. For instance: 

P(/2(25)l/l(3)) = n^(-^2(25)(/5^)IV'l(3)'/?^) 

= P(/2(25)(^'^Ci)|a^,mCi) •p(/2(25)(mC2)|a*,mC2)• 
P(/2(25)(^*)l"*'^*) 

= piaja^,inci) ■ p{a^\a^,inc2) ■ p{Candlemaker\a^,b^) 
= 1 • 1 • 
= 



4.5.1 Calculating the information leakage 

Let us now calculate the information leakage for this example using the con- 
cepts from Section 4.4. We will analyze three different scenarios: 

Example a: There is feedback, but the probability of an observable does not 
depend on the history of secrets. In the auction protocol, this corre- 
sponds to a scenario where the probability of one of the mobsters to bid 
can depend on the increment imposed by the seller, but the history of 
who has previously bid in the past has no influence on how the seller 
chooses the bid increment in the coming turns. In other words, the 
seller cannot use the information of who has been bidding to change his 
strategy of defining the new increments. This situation corresponds to 
the original description of the protocol in [SA99], where the seller does 
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not have access to the identity of the bidder, for the sake of anonymity 
preservation. In general, we have p(/3^|a*, /3*~^) = p(/3j|/3*~^) for every 
1 < t < T. There is an exception, however: if there is no bidder, the 
case modeled by the secret being a^, then the auction terminates, which 
is signaled by the observable 6^. 

Example b: This is the most general case, without any restrictions. The 
presence of feedback allows the probability of the bidder to depend of 
the increment in the price. For instance, if Candlemaker is richer than 
Scarf ace, it is more likely that the former bids if the increment in the price 
is inc2 instead of inci. Also, the probability of an observable can depend 
on the history of secrets, i.e. in general p(/3j|a*, /3*~^) ^ p{Pt\P^~^) f^^' 
1 < t < T. This scenario can represent a situation where the seller 
is corrupted and can use his information to affect the outcome of the 
auction. As an example, suppose that the seller is a friend of Scarface 
and he wants to help him in the auction. One way of doing so is to check 
who was the winner of the last bidding round. Whenever the winner is 
Candlemaker, the seller chooses as increment the small value inci, hoping 
that it will give Scarface a good chance to bid in the next round. On 
the other hand, whenever the seller detects that the winner is Scarface, 
he chooses as the next increment the greater value mc2, hoping that 
it will minimize the chances of Candlemaker to bid in the next round 
(and therefore maximizing the chances of the auction to end up having 
Scarface as the final winner). 

Example c: There is no feedback. In the cocaine auction, we can have the 
(perhaps unrealistic) situation in which the increment added to the bid 
has no influence on the probability of Candlemaker or Scarface being the 
bidder. Mathematically, we have p(q^|q*~^, /3*~^) = p{a^\a^~^) for every 
1 < t < T. As in Example b, however, we do not impose any restriction 
on p(/3Ja*,/3*-i). 

For each scenario we need to fill in the values of the probabilities in the 
protocol tree in Figure 4.8. The probabilities for each example are listed in 
Table 4.9. Table 4.10 shows a comparison between some relevant values for 
the three cases. 

In Example a, since the probability of observables does not depend on the 
history of secrets, there is (almost) no information flowing from the input to 
the output, and the directed information I{A'^ — )■ B"^) is close to zero, i.e. 
the leakage is low. The only reason why the leakage is not zero is because the 
end of an auction needs to be signaled. Due to presence of feedback, however, 
the directed information in the other sense is non-zero, and so 

is the mutual information I{A'^; B'^). This is an example where the mutual 
information does not correspond to the real information leakage, since some (in 
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Probability 
variable 


Example a 
value 


Example b 
value 


Example c 
value 


Pi 


0.75 


0.70 


0.70 


P2 


0.24 


0.24 


0.24 


P3 


0.01 


0.01 


0.01 


Qi 


0.50 


0.55 


0.30 


95 


0.50 


0.45 


0.70 


96 


0.50 


0.45 


0.70 


97 


0.50 


0.55 


0.30 


P9 


0.04 


0.80 


0.75 


PlO 


0.95 


0.19 


0.20 


Pll 


0.01 


0.01 


0.05 


Pl2 


0.95 


0.19 


0.75 


Pl3 


0.04 


0.80 


0.20 


Pli 


0.01 


0.01 


0.05 


Pl5 


0.04 


0.90 


0.65 


Pl6 


0.95 


0.09 


0.35 


Pl7 


0.01 


0.01 


0.05 


Pis 


0.95 


0.09 


0.65 


Pl9 


0.04 


0.90 


0.35 


P20 


0.01 


0.01 


0.05 


922 


0.50 


0.80 


0.45 


923 


0.50 


0.20 


0.55 


924 


0.50 


0.20 


0.55 


925 


0.50 


0.80 


0.45 


927 


0.45 


0.75 


0.45 


928 


0.55 


0.25 


0.55 


929 


0.45 


0.35 


0.55 


930 


0.55 


0.65 


0.45 


932 


0.50 


0.55 


0.45 


933 


0.50 


0.45 


0.55 


934 


0.50 


0.40 


0.55 


935 


0.50 


0.60 


0.45 


937 


0.45 


0.60 


0.45 


938 


0.55 


0.40 


0.55 


939 


0.45 


0.35 


0.55 


940 


0.55 


0.55 


0.45 



Table 4.9: Values of the probabilities in Figure 4.8 for Examples a, b, and c 



this case, most) of the correlation between input and output can be attributed 
to the feedback. 

In Example b the information flow from input to output I{A'^ — > B'^) is 
significantly higher than zero, but still, due to feedback, the information flow 
from outputs to inputs I{B^ — > A^) is not zero and the mutual information 
B^) is higher than the directed information I[A'^ — >■ B^). 
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Interpretation 


Symbol 


Example a 


Example b 


Example c 


Input uncertainty 




1.9319 


1.9054 


1.9158 


Reactor uncertainty 


Hb 


1.1911 


1.5804 


1.9158 


A posteriori uncertainty 


H{A^\BT) 


1.0303 


1.2371 


1.4183 


Mutual information 




0.9016 


0.6684 


0.4975 


Leakage 


^ bT) 


0.1608 


0.3433 


0.4975 


Feedback information 


I{B^ ^ A^) 


0.7408 


0.3250 


0.0000 



Table 4.10: Values of the entropy and directed information for Examples a, b, 
and c, where I{A^;B'^) = H{A^) - H{A^\B^) and I{A^ B^) = Hr - 
H{A^\B^) 



In Example c, the absence of feedback implies that I{B^ — )■ A'^) is zero. 
In that case the values of I{A'^; B^) and I{A'^ — > B^) coincide, and represent 
the real leakage. 

Finally, Figure 4.9 shows a comparison between the values of the entropy 
and of the directed information in the examples. The totality of the mutual 
information I{A'^]B'^) is represented by the height of the correspondent bar, 
and we emphasize the contribution of the directed information in each direc- 
tion by splitting the bar into two parts. This figure highlights the fact that 
mutual information can be misleading as a measure of leakage. The great- 
est mutual information is obtained in Example a, followed by Example b and 
then by Example c. The real leakage, however, given re- 
spects exactly the inverse order, namely Example a presents the lowest value 
while Example c presents the highest one. Indeed, in Example a the value of 
I{A'^ — )• B^) represents only 18% of the mutual information, while in Example 
b it represents 51% and in Example c it amounts to 100%. 

4.6 Topological properties of IIHSs and their 
capacity 

In this section we show how to extend to IIHSs the notion of pseudometric 
defined in [DJGP02] for Concurrent Labeled Markov Chains, and we prove 
that the capacity of the corresponding channels is a continuous function with 
respect to this pseudometric. The pseudometric construction is sound for gen- 
eral IIHSs, but the result on capacity is only valid for secret-nondeterministic 
IIHSs. 

Given a set of states S, a pseudometric is a function d that yields a non- 
negative real number for each pair of states and satisfies the following: 

(i) dis,s)=0; 
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0.0-^ < ' ' < ' ' < L. 

Example a Example b Example c 



Figure 4.9: Comparison between the leakage in Examples a, b, and c 



(i) d{s,t) = d{t,s); and 

(i) d{s,t) < d{s,u) + d{u,t). 

We say that a pseudometric d is c-bounded if Vs, t : d{s, t) < c, where c is 
a positive real number. 

Note that, in contrast to metrics, in pseudometrics two elements can have 
distance without being identical. We consider pseudometrics instead of met- 
rics because our purpose is to extend the notion of (probabilistic) bisimulation: 
having distance will correspond to being bisimilar. 

We now define a complete lattice structure on pseudometrics, in order 
to define the distance between IIHSs as the greatest fixpoint of a particular 
transformation, in line with the coinductive theory of bisimilarity. Since larger 
bisimulations identify more, the natural extension of the ordering to pseudo- 
metrics must shorten the distances as we go up in the lattice: 

Definition 22. A4 is the class of 1-bounded pseudometrics on states with the 
ordering 

d<d' ifis^s' G S : d{s,s') > d'{s,s'). 

It is easy to see that {A4, ^) is a complete lattice. In order to define 
pseudometrics on IIHSs, we now need to lift the pseudometrics on states 
to pseudometrics on distributions in T>{C x S). Following standard lines 
[vBWOl, DJGP02, DCPP06], we apply the construction based on the Kan- 
torovich metric [Kan42]. 
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Definition 23. For d € M., and fi, fi' € V{C x S), we define d{fi, fi') (over- 
loading the notation d) as 

d{iJ.,n') = max ^ {fi{£i,Si) - iJ.'{£i,Si))xi 

where the maximum is taken over all possible values of the Xi 's, subject to the 
constraints < < 1 and Xi — Xj < d{{li, Si), {ij, Sj)), where 



d{{ii,Si), iij,Sj)) 



1 ifii^ij 
d{si,Sj) otherwise 



It can be shown that with this definition m is a pseudometric on T>{C x S). 

Definition 24. A pseudometric d & A4 is a bisimulation pseudometric ^ if, 
for all e € [0,1), d{s,s') < e implies that if s fi, then there exists some /i' 
such that s' fi' and d{fi,fi') < e. 

Note that it is not necessary to require the converse of the condition in 
Definition 24 to get a complete analogy with bisimulation: the converse is 
indeed implied by the symmetry of d as a pseudometric. Note also that we 
prohibit e to be 1 because, throughout this chapter, 1 represents the maximum 
distance, which includes the case where one state may perform a transition and 
the other may not. 

The greatest bisimulation pseudometric is 

dmax = I \{d G I (i is a bisimulation pseudometric} (4-12) 

We now characterize d^ax as a fixed point of a monotonic function <I> on 
A4. Eventually we are interested in the distance between IIHSs, and for the 
sake of simplicity, from now on we consider only the distance between states 
belonging to different IIHSs. The extension to the general case is trivial. For 
clarity purposes, we assume that different IIHSs have disjoint sets of states. 

Definition 25. Given two IIHSs with transition relations 6 and 9' respectively, 
and a pseudometric d on states, define ^ : M. ^ M. as: 

' uiay.id{si,s[) if i9(s) = {(5(„^,,^), . . . , (5(„„,,^)} 

and 1?'(s') = {5(ai,4),---,'5(a„,s^)} 

$(d)(s,s') = I d{n,^i') if^{s) = {fi} and^'is') = {//'} 

i/,9(s) = i?'(s') = 

1 otherwise 



^In literature a pseudometric with this property is also known as bisimulation metric, 
although it is still a pseudometric. 
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It is easy to see that the definition of $ is a particular case of the function 
F defined in [DJGP02, DCPP06], which is characterized as follows (cf. Lemma 
3.8 in the full version of [DJGP02], and Definition 2.7 in [DCPP06]): 

F((i)(s, s') = maxjsup inf d{fi, fi') , sup inf d{fi, fi')} 

Hence it can be proved, as an instance of the analogous result for F (cf. 
Lemma 2.8 in [DCPP06]), that ^{d) is a pseudometric, and that the following 
property holds. 

Lemma 26. For e £ [0,1), ^{d){s,s') < e holds if and only if whenever s ^ fi, 
there exists some fi' such that s' — )■ fi' and d{p, < e. 

From the above lemma and Definition 24 we derive (see also Lemma 2.9 in 
[DCPP06]): 

Corollary 27. A pseudometric d is a bisimulation pseudometric if and only 
ifd< ^{d). 

By applying Corollary 27 to (4.12) we obtain 

dmo^ = \_\{d eM\d< $(d)} 

Furthermore, by adapting the proof of the monotonicity of F (cf. Lemma 3.9 
in the full version of [DJGP02]) we can prove the following: 

Lemma 28. $ is monotonic on {M ^). 

Thanks to Lemma 28, and using Tarski's fixed point theorem as formulated 
in [Tar55], we have that dmax is the greatest fixed point of Furthermore, 
by Corollary 27 we know that dmax is indeed a bisimulation pseudometric, and 
that it is the greatest bisimulation pseudometric. 

In addition, the finite branching property of IIHSs ensures that the closure 
ordinal of <I> is a; (cf. Lemma 3.10 in the full version of [DJGP02]). Therefore 
we can proceed in a standard way to show that 

dma. = n{$'(T) I ieN}, 

where T is the greatest pseudometric (i.e. T(s,s') = for every s,,s'), and 
$0(T) = T. 

Given two IIHSs J and J', with initial states s and s' respectively, we define 
the distance between J and J' as (i(J, J') = dmaxis, s'). The following properties 
are auxiliary to the theorem which states the continuity of the capacity. 

Lemma 29. Consider two IIHSs J and 3' with transition functions {) and 
i9' respectively. Given t > 2 and two sequences a* and f3^, assume that both 
J(a*~^, /3*~^) and J'(a*~^, /3*~^) are defined. Assume also it is the case that 
dmax{3{a'-\(3'-^),3'{a'-\(3'-^)) < p{(3t \ a*,/3*-^), and ^3{a\ P'-^)) + 0. 
Then: 



81 



4. Information flow in interactive systems 



1. ??'(J'(a*,/3*-i)) ^ holds as well, 

2. J(a*,/3*) andy{a\j3^) are both defined, p{/3t \ a*,/?*"^) > 0, and 

d^Ma ,/3 ), J (a ,/3 )) < p^p, \ a\ P^-^). 

Proof. 

1. Assume ?9(J(a*, ^ and, by contradiction, i?'(J'(a*, /3*"^)) = 0. 
Since dmax is a fixed point of we liave (imaa; = ^(dmax), and tlierefore 

dma.(a(a*, J' (a*, = a>(d„a.)(J(a*,/^*-^),J'(a*,/3*-^)) 

= 1 

> p(A|a*,/3*-i), 

which contradicts the hypothesis. 

2. If i?(J(a*, 7^ 0, then, by the first point of this lemma, we have 
that t?'(J'(a*,/3*-^)) / holds as well, and therefore both J(a*,/3*) and 
J'(a*, /3*) are defined. The hypothesis /S*-^), J'(a*-\ /S*-^)) < 
p{/3t I a*,/3*~i) ensures that p(/34 | a*,/3*~^) > 0. 

Let us now prove the bound on (imaa;(J(o*, /3*), J'(q*, /3*)). By definition 
of we have 

^ (d„^a.) (J(a*-^ , ) , a' (a*-\ ) ) > (i™ax(J(a* , Z?*"' ) , (a* , Z?*"' ) ) • 
Since (imax = ^(dmax), we have 

fimax(J(a*-\ J'(a*-\ > (i„ax(J(a*, /S*"'), J'(a*, 

(4.13) 

By definition of <I> and of the Kantorovich metric, we have 

^(d™a.)(J(a*,/3*-^),a'(a*, > p{f3t\a\(3'-^y 

d™a.(J(a*,/3*),J'(a*,/3*)). 

Using again dmax = ^{dmax), we get 

dmax{3{a\(3'~^),3'{a\(3'~^)) > p(/3t | a*, /3*-l)• 
which, together with (4.13), allows us to conclude. 

□ 
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Lemma 30. Consider two IIHSs 3 and 3', and let p{- \ •, •) and p'{- \ ■, •) 

be their distributions on the output nodes. Given T > 0, and two sequences 
and fi'^ , assume that p{j3t \ a*,/3*~^) > for every t < T. Let m = 
mini<t<Tp(/3t | a*,/3*~^) and let e G (0,m"'"~^). Assume d{3,3') < e. Then, 
for every t <T, we have 

€ 



p(/3t|a*,/3*-i)-p'(A|a*,/3*-i)< 



Proof. Observe that, for every t < T, 3{a^,/3^) must be defined, and, by re- 
peatedly applying Lemma 29(1), we get that also 3'{a^,f3^) is defined. By 
definition of <1>, and of the Kantorovich metric, we have 

p{^t I a\(3'-')-p'{(3t I a*,/3*-^) < ^>((i„a.)(J(a*-\ /3*-i), J'(a*-\ 
and since d^ax is a fixed point of we get 

p{(3t I a\(3'-')-p'{(3t I < d™ax(J(a*~\ a'(a*"\/3*"')). (4.14) 

By applying Lemma 29(2) t — 1 times, from (4.14) we get 



< 
< 



□ 



Note that previous lemma states a sort of continuity property of the matri- 
ces obtained from IIHSs, but not uniform continuity, because of the dependence 
on one of the two IIHSs. It is easy to see (from the proof of the Lemma) that 
uniform continuity does not hold. 

The main contribution of this section, stated in the next theorem, is the 
continuity of the capacity with respect to the pseudometric on IIHSs. For this 
theorem, we assume that the IIHSs are normalized. Furthermore, it is crucial 
that they are secret-nondeterministic (while the definition of the pseudometric 
holds in general). 

Theorem 31. Consider two normalized IIHSs 3 and 3' , and fix a T > 0. For 

every e > there exists > such that if d{3,3') < v then |C5"(J) — 
Ct(J')I <e- 

Proof. Consider two normalized IIHSs 3 and 3' and choose T, e > 0. Let Dt 
be the set of all input distributions in presence of feedback. Observe that 

\Ct{3) - Ct{3')\ = \umx^I{A^^B^)-umx^I{A'^^B'^)\ 

< i max \I{A^ ^ B^) - I{A'^ ^ B'^)\ 
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Since the directed information I{A'^ — )■ i?^) is defined by means of arith- 
metic operations and logarithms on the joint probabilities p{a^, 13'') and on the 
conditional probabilities p(a*,/3*), p(q*,/3*~^), which in turn can be obtained 
by means of arithmetic operations from the probabilities p(/3t | a*,/3*~^) and 
('/'*)) we have that I{A^ — )• B^) is a continuous function of the distribu- 
tions p{/3t I a*,/3*"^) and pri^p^), for every t < T. Let p{(3t \ a*,/3*"^), p'iPt \ 
a*,/3*~^) be the distributions on the output nodes of J and J', modified in the 
following way: starting from level T, whenever p{/3t \ a'',P'~^) = 0, then we re- 
define the distributions at all the output nodes of the subtree rooted in 
so that they coincide with the distribution of the corresponding nodes of in 3' , 
and analogously for p'{(3t \ a*,/3*~^). Note that this transformation does not 
change the directed information, because the subtree rooted in J(a*,/3*) does 
not contribute to it, due to the fact that the probability of reaching any of its 
nodes is 0. The continuity of I{A^ — )■ B^) implies that there exists e' > such 
that, if \p{l3t I a*,/3*-^) -p'{f3t \ a*,/?*"^)| < e' for all t < T and all sequences 
a\ /?*, then, for any we have |/(^'^ B^) - I{A'^ B''^)\ < e. The 

result then follows from Lemma 30, by choosing 

/ 



mm 



V pWi 



mm 
1 < t < T 



pi/3t\a\/3'-'), 



a 



\ 



min I Q*,/3*-i) . 

l<t <T 

I a*,/3*-i) >0 / 

□ 

We conclude this section with an example showing that the continuity 
result for the capacity does not hold if the construction of the channel is done 
starting from a system in which the secrets are endowed with a probability 
distribution. This is also the reason why we could not simply adopt the proof 
technique of the continuity result in [DJGP02] and we had to come up with 
different reasoning. 

Example 6. Consider the two following programs, where 01,02 are secrets, 
bi, 62 o'^e observable, \\ is the parallel operator, and +p is a binary probabilistic 
choice that assigns probability p to the left branch, and probability 1 — p to the 
right one. 

s) [send{ai) +p send{a2)) \\ receive{x) .output[b2) 

t) [send{ai)+q send{a2)) \\ receive{x) .if x = ai then output{bi) else output(J)2). 
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Table 4. 11 shows the fully probabilistic IIHSs corresponding to these pro- 
grams, and their associated channels, which in this case (since the secret ac- 
tions are all at the top-level) are classical channels, i.e. memoryless and with- 
out feedback. As usual for classical channels, they do not depend on p and q. 
It is easy to see that the capacity of the first channel is and the capacity of 
the second one is 1. Hence their difference is 1, independently of p and q. 

Let now p = and q = e. It is easy to see that the distance between s and 
t is €. Therefore (when the automata have probabilities on the secrets), the 
capacity is not a continuous function of the distance. 




s 


bi 


&2 


ai 





1 


02 





1 



t 


bi 


62 




1 





a2 





1 



(a) (Channel for s (b) Channel for t 

Table 4.11: The IIHSs of Example 6 and their corresponding channels 



4.7 Related work 

Gray investigated a concept similar to directed information in [Gra91]. In 
contrast to our model, which is based on an eavesdropper scenario, he con- 
sidered leakage in a sender-receiver model. More precisely, he considered a 
system based on Millen's synchronous state machine [Mil90], and connected to 
"low" and "high" environments via communication channels. His purpose was 
to measure the flow of information from the high environment to the low one, 
assuming that the only way for the low environment to learn about the high 
one (and vice versa) is through the system. To this end, he defined a notion 
of "quasi-directed information" by extending Gallager's formula for discrete 
finite state channels [Gal68]. He also conjectured a correspondence between 
the quasi-directed information and the transmission rate of the channel. His 
formulation of quasi-directed information, however, is not completely the same 
as directed information, and as a result the conjecture does not hold. 

The continuity of the channel capacity was also proved in [DJGP02] for 
simple channels, but the proof does not adapt to the case of channels with 
memory and feedback and we had to devise a different technique. 
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4.8 Chapter summary and discussion 

In this chapter we have investigated the problem of information leakage in 
interactive systems, and proved that these systems can be modeled as channels 
with memory and feedback. We have also proved that the channel capacity is 
a continuous function of a pseudometric based on the Kantorovich metric. 

We have considered various kinds of automata corresponding to different 
combinations of nondeterministic and probabilistic choice, as summarized in 
Table 4.12(a). Note that in this the third row corresponds to the limit case in 
which the reactor is a Dirac measure, i.e. the probability is all concentrated 
on exactly one Lp^ € J-. It is easy to see that in this case I{A^ — )■ B^) = (all 
the entropies that constitute I{A^ B'^) are 0), although I{B^ A'") ^ 
0. Therefore there is no leakage. In the classic case this corresponds to the 
situation in which the input distribution is a Dirac measure. 

Table 4.12(b) summarizes the comparison between the channels with mem- 
ory and feedback investigated in this chapter, and the classic channels. 

Throughout this chapter we have assumed that the dependence of the secret 
choices on the observables is part of the external knowledge and, therefore, 
not considered leakage. The reader may wonder what would happen if this 
assumption were dropped. We argue that in this case I{B^ — t- A^) could be 
considered as part of the leakage. In the cases a and b of the cocaine auction 
example in Section 4.5, for instance, one may want to consider the information 
that we can deduce about the secrets (the identities of the bidder) from the 
observables (the increments of the seller) as a leak due to the protocol. 

In some other cases the flow of information from the observables to the 
secrets may even be considered as a consequence of the active attacks of an 
adversary, which uses the observables to modify the probability of the secrets. 
In this case I{B^ — > A^) could represent a measure of the effectiveness of the 
adversary. 

As future work, we would like to provide algorithms to compute the leak- 
age and maximum leakage of interactive systems. These are rather challenging 
problems given the exponential growth of reaction functions (needed to com- 
pute the leakage) and the quantification over infinitely many reactors (given 
by the definition of maximum leakage in terms of capacity). One possible so- 
lution is to study the relation between deterministic schedulers and sequence 
of reaction functions. In particular, we believe that for each sequence of reac- 
tion functions and distribution over it there exists a probabilistic scheduler for 
the automata representation of the secret-nondeterministic IIHS. In this way, 
the problem of computing the leakage and maximum leakage would reduce to 
a standard probabilistic model checking problem (where the challenge is to 
compute probabilities ranging over infinitely many schedulers). 

In addition, we plan to investigate measures of leakage for interactive sys- 
tems other than mutual information and capacity. 

We intend to study the applicability of our framework to the area of 
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IIHSs as automata 


IIHSs as channels 


Notion of leakage 


Normalized IIHSs with 
nondeterministic secrets 
and probabilistic observables 


Sequence of 
stochastic kernels 
M/3,|a*,/3*-i)}?:i 


Leakage as capacity 


Fully probabilistic 
normalized IIHSs 


Sequence of 
stochastic kernels 

MAi«*,/3*-^)}?:i 

+ reactor 


Leakage as directed 
information 


Normalized IIHSs with a 
deterministic scheduler 
solving the nondeterminism 


Sequence of 
stochastic kernels 
M/3,|a*,/5*-i)}?:i 
+ reaction function 
sequence ip^ 


No leakage 



(a) The various models considered in this chapter 



Classical channels 


Channels with memory and feedback 


The system is modeled in 
independent uses of the channel, 
often a unique use. 


The system is modeled in several 
consecutive uses of the channel. 


The channel is defined on 
, i.e. its input is 
a single string ~ ai . . . 
of secret symbols and its output 
is a single string 0^ = Pi . . . Pj, 
of observable symbols. 


The channel is defined on ^ ;B, i.e. 
its input is a reaction function ip^ 
and its output is an observable /3(. 


The channel is memoryless and 
in general it is implicitly assumed 
the absence of feedback. 


The channel has memory. Despite the 
fact that the channel defined on T B 
does not have feedback, the internal 
stochastic kernels do. 


The capacity is calculated using 
mutual information I{A^ ; B'^). 


The capacity is calculated using mutual 
directed information I{A^ B^)- 



(b) Classical channels vs. channels with memory and feedback 



Table 4.12: Summary of results 

game theory. In particular, the interactive nature of games such as Prisoner 
Dilemma [Pou92] and Stag and Hunt [Sky03] (in their iterative versions) can 
be modeled as channels with memory and feedback following the techniques 
proposed in this work. Furthermore, (probabilistic) strategies can be encoded 
as reaction functions. In this way, optimal strategies are attained by reaction 
functions maximizing the leakage of the channel. 
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Five 



Differential privacy: the trade-off 
between leakage and utility 



'// you have nothing to hide, then you don't have a life. 

cited by Daniel J. Solove 



In this chapter we consider the differential privacy approach to the prob- 
lem of statistical disclosure control. In general a statistical database contains 
data of a group of individuals, and users can pose queries to obtain statis- 
tical information about the sample in the dataset. To preserve the privacy 
of the the participants in the database, it is desirable to restrict the amount 
of information that the system leaks about their individual values. One way 
of dealing with the problem is by using randomization mechanisms: to avoid 
leakage, the real answer is modified with some carefully added noise before 
being reported to the users. A very popular and studied way of doing so is 
based on the concept of differential privacy. 

In our work we consider the relation between differential privacy and quan- 
titative information flow. We address the problem of characterizing the pro- 
tection that differential privacy provides to individuals with respect to infor- 
mation leakage, and the problem of the utility, i.e. the measure of how close 
the reported answer is to the true answer. 

Contribution The main contributions of this chapter can be summarized 
as follows. 

• We propose an information-theoretic framework to reason about both 
information leakage and utility. 
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• We explore the graph-theoretic foundations of the adjacency relation on 
databases^, and we point out two types of symmetries which allow us 
to establish a strict link between differential privacy and information 
leakage. 

• We prove that e-differential privacy implies a tight bound on the min- 
entropy leakage. 

• We prove that e-differential privacy implies a bound on the utility, mea- 
sured in terms of binary gain functions. We prove that, under certain 
conditions, the bound is tight. 

• We identify a method that, under certain conditions, constructs random- 
ization mechanisms that maximize utility while providing e-differential 
privacy. 

Plan of the Chapter This chapter is organized as follows. In Section 5.1 
we formalize the notion of differential privacy and present an alternative inter- 
pretation for it in the special case where the adjacency relation on databases 
is complete (i.e. every two distinct databases are adjacent). In Section 5.2 we 
introduce our model to reason about leakage and utility for randomized func- 
tions in the case where the query and the randomization mechanism can be 
split into two distinct channels. In Section 5.3 we review some concepts from 
graph theory and present two special classes of graphs having symmetries that 
we will explore to make the connection between differential privacy and quan- 
titative information flow. We also show that the graph structure on databases, 
induced by the adjacency relation and the query, presents these symmetries. 
In Section 5.4 we use the results of the previous section to prove a bound on 
the a posteriori min-entropy of the channel matrix. Then we apply this bound 
to derive our results for leakage in Section 5.5 and for utility in Section 5.6. 
Finally, in Section 5.7 we review some of the related work in the literature, 
and in Section 5.8 we make our final remarks and conclude this chapter. 

5.1 Differential privacy 

Databases are commonly used for obtaining statistical information about their 
participants. Simple examples of statistical queries are, for instance, the pre- 
dominant disease in a certain population, or the average salary of a group of 
people. The fact that the answer is publicly available may, however, constitute 
a threat for the privacy of the individuals. 

In order to illustrate the problem, consider a database that stores the values 
of the salaries of a set of individuals, and assume that a user can pose the query 
"what is the average salary of the participants in the database?". In principle 

^The adjacency relation on databases will be defined precisely in Section 5.2. 
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we would like to consider the global information relative to the database as 
public, and the individual information about a participant as private. In this 
example, we would like to obtain the average salary without being able to 
infer the salary of any specific participant. Unfortunately this is not always 
possible. In particular, if the number of participants in the database is known, 
and an individual is removed from (or included in) the database, it is possible 
to infer his salary by querying again the database and calculating the infiuence 
of the removal (or inclusion) on the reported answer to the query. 

Another kind of private information we may want to protect is whether 
a specific individual is participating or not in a database. If we know that a 
particular individual earns, say, 5.000€ a month, and all the other individuals 
earn less than 4.000€ a month, then learning that the average salary is greater 
that 4.000€ will reveal immediately the presence of our individual of interest 
in the database. 

A common approach to this problem is to introduce some output pertur- 
bation mechanism based on randomization: instead of the exact answer, the 
querying mechanism reports a "noisy" answer. Namely, a randomized function 
is used to produce answers according to some probability distribution that de- 
pends on the database. The goal is to report this randomized answer, which 
ideally should be "close enough" to the real one, yet should make it harder 
for the user to guess the values of individual participants. For certain distri- 
butions, however, it may still be possible to guess the value of an individual 
with a high probability of success. The notion of differential privacy, due to 
Dwork [Dwo06, DL09, DwolO, Dwoll], is a proposal to control the risk of 
violating privacy for both kinds of threats described above (value and partici- 
pation) . The idea is to say that a randomized function /C satisfies e-differential 
privacy (for some e > 0) if the ratio between the probabilities that two ad- 
jacent databases give a certain answer is bound by e^, where by "adjacent" 
we mean that the databases differ in only one individual (either for the value 
of an individual or for the presence/absence of an individual). The notion of 
differential privacy was developed to be independent of the side (or auxiliary) 
information the user can have about the database, and how it can affect his 
knowledge about the database before posing the query. This information can 
come from external sources (e.g. newspapers, common knowledge, etc), but 
does not affect the guarantees assured by differential privacy. 

In this chapter we explore the similarities between differential privacy and 
quantitative information fiow. We base our approach on the following observa- 
tions: at the motivational level, the concern about privacy is akin the concern 
about information leakage. At the conceptual level, the randomized function 
/C can be seen as an information-theoretic channel, and the limit case of e = 0, 
for which the privacy protection is total, corresponds to a 0-capacity channel, 
which does not allow any leakage. More specifically, we investigate the no- 
tion of differential privacy and its implications in the light of the min-entropy 
framework for information fiow discussed in Chapter 3. 
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5.1.1 Formal definition 

Let X be the set of all possible databases. Two databases x,x' € X are 
adjacent (or neighbors), written x ~ x', if they differ in the value of exactly 
one individual. Note that the structure (X,^) forms an undirected graph. 

Intuitively, differential privacy is based on the idea that a randomized query 
function provides sufficient protection if the ratio between the probabilities of 
two adjacent databases to give a certain answer is bound by e^, for some e > 0. 
Formally: 

Definition 32 ([Dwoll]). A randomized function KL from X to Z satisfies 
e- differential privacy if for all pairs x,x' G X, with x ~ x' , and all S ^ Z, we 
have: 

Pr[/C(x) G 5] < X Pr[}C{x') £ S] 

In this thesis we consider Z to be finite, therefore each of its probability 
distributions is finite and we can rewrite the property of e-differential privacy 
more simply. Using the notation of conditional probabilities, and considering 
both quotients, we can say that e-differential-privacy holds in the discrete case 
if, for all x,x' X with x ~ x', and all z £ Z: 

1 Pr{Z = z\X = x\ ^ 

^ - Pr\Z = z\X = x']-^ ^^-^^ 

where X and Z represent the random variables associated to X and Z, respec- 
tively. 

Intuitively, (5.1) implies that, if a value of one single individual changes 
in a dataset (either by inclusion, removal or modification), the probability of 
the querying mechanism to report a specific answer will not "vary much". In 
other words, the influence of a single individual in a database is "negligible" 
with respect to the whole set of individuals. Of course the notion of what is 
meant by "much" and "negligible" depends on the value of e. 

5.1.2 Alternative interpretation in the case of cliques 

A special interpretation of differential privacy is possible in the case where 
every two distinct databases in X are neighbors. More precisely, if {X, ~) is 
a clique (i.e. a complete graph), it is possible to ensure that he ratio between 
any a priori knowledge Pr[X = x\ of the user (before the query is posed) and 
his a posteriori knowledge Pr[X = x\Z = z\ (after the answer to the query is 
reported) is bound by e*^. Formally, if for every x,x' € X with x ^ x' we have 
X ~ x' then: 

1 Pr\X = x\Z = z] , „ r -, / s 

~^ p ry _ — -<e for all priors Pr[X = x], (5.2) 

Pr[X - x\ j^j^ xeX, and all z e Z 
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where X and Z represent the random variables associated to X and Z, respec- 
tively. 

Intuitively, (5.2) states that the observation of the reported answer should 
not "change much" the user's knowledge about the database. The next propo- 
sition shows that in the special case of every pair of distinct databases are 
neighbors, the above formulation of differential privacy is equivalent to the 
classic one. 

Proposition 33. If for all x,x' X with x ^ x' we have x ~ x' , then (5.1) 
and (5.2) are equivalent. 

Proof. Let us represent by X and Z the random variables associated to X and 
Z, respectively. For better readability, we will denote Pr[X = x], Pr[Z = z], 
Pr[Z = z\X = x] and Pr[X = x\Z = z] by Pr{x), Pr{z), Pr(x\z) and Pr{z\x), 
respectively. 

• (5.1) =^ (5.2) 



Pr{x\z) 



> 



Pr{z\x)Pr{x) 
Pr{z) 

Pr{z\x)Pr{x) 
T..'ex {Pr{x')Pr{z\x')) 

Pr[z\x)Pr[x) 

Pr(z\x)Pr{x) 
e^Pr{z\x) 

Pr{x) 



(by the Bayes law) 



by (5.1) 



from which it follows that n^,^'^)^ < e^. The case of ^ < is a 

Pr(x\z) — — Pr(x\z) 

analogous: just take the symmetrical step when applying (5.1) in the 
derivation above. 

(5.2) =^ (5.1) 



For every prior Pr{x) we have 
Pr{x\z) Pr{z\x) 
Pr{x) p{z) 

Pr{z\x) 



(by the Bayes law) 



{Pr{x")Pr{z\x")) 
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In particular, the above is valid for every prior of the form Pr{x) = Sx' (x), 
where x' ^ X . Therefore, for all x' ^ X 

Pr{x\z) Pr{z\x) 
Pr{x) " Ex" {Sx'{x")Pr{z\x")) 

Pr{z\x) 
Pr{z\x') 

Since by (5.2) we have ^ < '^^^].'^^ < e*^ for every prior Pr{x), it follows 
from the derivation above that also ^ < p^^j^^j^^ < for all x' G X. 

□ 

5.2 A model of utility and privacy for statistical 
databases 

In this section we present a model of statistical queries on databases, where 
noise is carefully added to protect the privacy of the participants in the sample, 
and the reported answer to a query does not need to be the real one. In 
this model, the notion of information leakage is to measure the amount of 
information that an adversary can learn about the database by posing queries 
and then analyzing the reported answers. Note that in principle the adversary 
can be a user of the database, and therefore the privacy guarantees should not 
depend on distinctions of who is posing the queries. Our model will also allow 
us to quantify the utility of the query, i.e. how much information about the 
real answer can be obtained from the reported one. In our work we focus on 
the case in which all the values of interest are discrete. 

We fix a finite set Ind = {0, 1, . . . , u — 1} of n individuals participating 
in the database. In addition, we fix a finite set Val = {vq, vi, . . . , v„_i}, 
representing the set of {v different) possible values for the sensitive attribute 
of each individual (e.g. disease-name in a medical database). In the more 
general case where there are several sensitive attributes in the database (e.g. 
salary and security number in a census sample), we can think of the elements 
of Val as tuples. The absence of an individual in the database, if allowed, can 
be modeled with one special value in Val (see the discussion in Section 5.2.2). 
A database D = do . . . du~i is a u-tuple where each di € Val is the value of 
the corresponding individual. The set of all databases \s X = Val^ . Two 
databases x,x' are adjacent^ written x ^ x', if and only if they differ in the 
value of exactly one individual. As we already pointed out, the structure 
[X, ~) forms an undirected graph, and we call ~ its adjacency relation. 

Let /C be a randomized function from X to where Z = Range(K,) (see 
Figure 5.1). This function can be modeled by a channel {X , Z^pz\x['V))-, where 
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X and Z are the input and output alphabets, respectively, and Pz\x{'\') is the 
channel matrix. The random variables modeling the input and output of the 
channel are denoted by X and Z, respectively. The definition of differential 
privacy can be directly expressed as a property of the channel: it satisfies 
e-differential privacy if 

p{z\x) < e'^p{z\x') for all x,x' & X with x ~ x', and all z & Z 



X 
dataset 



reported 
answer 



(.-dijf. priv. 
randomized function 



Figure 5.1: Randomized function /C 



Intuitively, the correlation between X and Z measures how much infor- 
mation about the complete database the attacker can obtain by observing the 
reported answer. We will refer to this correlation as the leakage of the chan- 
nel, denoted by C{X,Z). In Section 5.5 we will discuss how this leakage can 
be quantified using notions from information theory, and we will study the 
behavior of the leakage for differentially private queries. 

In our model the true answer to the query / is modeled by the random 
variable Y ranging over 3^ = Range{f). The correlation between Y and Z 
measures how much we can learn about the real answer from the reported 
one. We will refer to this correlation as the utility of the channel, denoted by 
U{Y,Z). In Section 5.6 we will discuss in detail how the utility can be quan- 
tified, and we will investigate how to construct a randomization mechanism, 
i.e. a way of adding noise to the query outputs, so that utility is maximized 
while preserving differential privacy. 

In practice, the randomization mechanism is often oblivious, meaning that 
the reported answer Z only depends on the real answer Y and not on the 
database X. In this case, the randomized function /C, seen as a channel, can 
be decomposed into two parts: a channel modeling the query /, and a channel 
modeling the oblivious randomization mechanism Ti. These two channels are 
said to be in cascade, as the output of the first one is the input for the second 
one. The definition of utility can be then simplified as it only depends on 
properties of the sub-channel corresponding to H. The leakage relating X and 
Y and the utility relating Y and Z for a decomposed randomized function are 
shown in Figure 5.2. 

We capture the notion of the attacker's side information as the prior dis- 
tribution on X, which is standard in information flow and also in papers on 
differential privacy [GRS09, KS]. 
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Figure 5.2: Leakage and utility for oblivious mechanisms 



5.2.1 Leakage about an individual 

As already discussed, C{X, Z) can be used to quantify the information that the 
attacker can learn about the whole database. Protecting the entire database 
at once, however, is not the main goal of differential privacy. In fact, some 
information will necessarily be revealed, otherwise the query would not be 
useful. Instead, differential privacy aims at protecting the value of any single 
individual, even in the worst case where the values of all other individuals are 
known. To quantify this information leakage we can define smaller channels, 
where only the information of a specific individual varies. Let X- G Var~^ be 
a {u— l)-tuple with the values of all individuals but one (the individual whose 
degree of protection we want to quantify). We create a channel /C^.- whose 
input alphabet is the set of all databases in which the u — 1 other individuals 
have the same values as in x~. Note that, since x~ is fixed, to define the input 
of the channel it is enough to specify the value of the individual of interest. In 
this way the input for the channel can be seen as a random variable V ranging 
over the set Val. Intuitively, the information leakage of this channel measures 
how much information about one particular individual the attacker can learn 
if the values of all others are known to be x~ . This leakage will be studied in 
Section 5.5.1. 



5.2.2 A note on the choice of values 

The choice of the set Val depends on the assumptions about the attacker's 
knowledge. In particular, if the attacker does not know which individuals 
participate in the database, a distinguished value in Val could be interpreted 
as absence (e.g. the value or the special value null). As discussed in [Dwoll], 
a database x' adjacent to x can be though of either as being a superset (or 
subset) of x with one extra (or missing) row, or as being exactly the same 
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database as x in all rows except for one which has a different {non-null) value. 
Our definition of ~ with the possibility of null values covers all these cases. 

At this point an important observation should be made about the choice of 
Val. Most often we are interested in protecting the actual value of an individ- 
ual, not only his participation in the database. In this case, the definition of 
differential privacy (as well as the channels we are constructing) should include 
databases with all possible values for each individual, not just the "real" ones. 
In other words, to prevent the attacker from finding out the individual's value, 
the probability p{z\x), where x contains the individual's true value, should be 
close to p{z\x') where x' contains a hypothetical value for this individual. This 
might seem unnecessary at first sight, since differential privacy is often thought 
of as protecting the participation of an individual in a database. Hiding the 
participation of an individual, however, does not imply hiding his value. Con- 
sider the following example: we aim at learning the average salary of employees 
in a small company, and it happens that all of them have exactly the same 
salary s. We allow anyone to participate or not, while offering e-differential 
privacy. If we only consider s as the value in all possible databases, then the 
query is always constant, so answering it any number of times without any 
noise should satisfy differential privacy for any e > 0. Since all reported an- 
swers are s, the attacker can deduce that the salary of all employees, including 
those not participating in the query, is s. Indeed, the attacker cannot find out 
who participated, despite the value of all individuals is revealed. 

In other cases, we are only interested in hiding the identity of the par- 
ticipants (e.g. in a database with information about anonymous donations). 
Thus, Val should be properly selected according to the application. If who has 
participated is known and we only wish to hide the values, then Val should 
contain all possible values, e.g. all possible salaries in the example above. If 
the values are known and participation is to be hidden, then Val can contain 
just the values and 1 denoting absence and presence respectively. Finally, if 
both the value and the the identities of the participants are to be protected, 
then Val should contain all values plus null. 

5.2.3 The questions we explore with the help of our model 

We will use the model we just introduced to explore the following questions: 

1. Does e-differential privacy induce a bound on the information leakage of 
the randomized function /C? 

2. Does e-differential privacy induce a bound on the information leakage 
relative to an individual! 

3. Does e-differential privacy induce a bound on the utility? 



97 



5. Differential privacy: the trade-off between leakage and 

UTILITY 



4. Given a query / and a value e > 0, can we construct a randomized func- 
tion /C which satisfies e- differential privacy and also presents maximum 
utility? 

We will see that the answers to 1 and 2 are positive in case we take the 
measure of leakage to be the min-entropy leakage, and we provide bounds that 
are tight (i.e. for every e there is a /C whose leakage reaches the bound). For 3 
we are able to give a tight bound in some cases which depend on the structure 
of the query, and for the same cases, we are able to construct an oblivious /C 
with maximum utility (defined in terms of a binary gain function), as requested 
by 4. 

5.3 Graph symmetries 

In this section we explore some classes of graphs that will allow us to derive 
a strict correspondence between e-differential privacy and the a posteriori en- 
tropy of the input. As we already mentioned, the input domain of databases 
and the adjacency relation forms an undirected graph, and this fact will be 
used to derive bounds on information leakage and utility. We will present two 
classes of graphs, distance-regular and VT~^, that will be used in the next 
section to transform a generic channel matrix into a matrix with a symmetric 
structure, while preserving the a posteriori min-entropy and the e-differential 
privacy. 

Let us first recall some basic notions. Given a graph G = (V, ~), the 
distance d{v, w) between two vertices v,w S V is the number of edges in a 
shortest path connecting them. The diameter 5 of G is the maximum distance 
between any two vertices in V. The degree of a vertex is the number of edges 
incident to it. G is called regular if every vertex has the same degree. A regular 
graph with vertices of degree k is called a k-regular graph. An automorphism 
of G is a permutation a on the vertex set V, such that for any pair of vertices 
V, zu, if V ~ iM, then (t{v) ~ a{w). If a is an automorphism, and v is a vertex, 
the orbit of v under a is the set {v, (t(v), . . . , a^~^{v)} where k is the smallest 
positive integer such that a^{v) = v. Clearly, the orbits of the vertices under 
a define a partition of V. If V is the set of vertices of G, we denote by V(^d^{v) 
the subset of vertices in V that are at distance d from the vertex v. 

The following two definitions introduce the classes of graphs that we are 
interested in. The first class is well known in literature. 

Definition 34 (Distance-regular graph). A graph G = (V, ~) is called distance- 
regular if there exist integers and Cd (d G {0, (called intersection 
numbers^ such that, for all vertices v,w at distance d{v,w) = d, there are 
exactly 

• hd neighbors of w in V(^d+i){'^) 
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• Cd neighbors of w in V(^d-i)i'^) 

Some examples of distance-regular graphs are illustrated in Figure 5.3. 






(a) Tetrahedral graph (b) Cubical graph (c) Petersen graph 

Figure 5.3: Some distance-regular graphs with degree 3 

The second class we are interested in is a variant of the VT (vertex- 
transitive^) class: 

Definition 35 (yT~^ graph). A graph G = (V,'--') is VT^ ('vertex-transitive 
+) if there are n automorphisms (Tq, cti, . . .(7„_i, where n = \V\, such that, 
for every vertex v € V, we have that {cTj(v) |0<i<n — 1} = V. 

In particular, the graphs for which there exists an automorphism a which 
induces only one orbit are VT~^: it is sufficient to define ai = a* for all i 
from to n — 1. Figure 5.4 illustrates some VT'^ graphs with a single-orbit 
automorphism. 




(a) Cycle: degree 2 (b) Degree 4 (c) Clique: degree 5 

Figure 5.4: Some VT~^ graphs 



From graph theory we know that neither of the two classes subsumes the 
other. They have however a non-empty intersection, which contains in partic- 
ular all the structures of the form (Va/",~), i.e. the database domains. 

The two next propositions show that the structure (^, ~) = (Va/",~) is 
both a distance-regular graph and a VT~^ graph. 



A graph G = (V, ~) is said to be vertex-transitive if for any pair v,w £ V there exists 
an automorphism a such that a{v) = w. 
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Proposition 36. Ifv>2, the graph (Val^,^) is a connected distance-regular 
graph with diameter 6 = u, and intersection numbers hd = {u — d){v — 1) and 
Cd = d, for all < d < 6. 

Proof. The vertices of (VaP,~) are u-tuples (vi, . . . , v^), Vj G Val and two 
vertices are adjacent if and only if the differ in exactly one element Vj. It is 
easy to see that the distance between two vertices is the number of elements 
in which they differ. Let xi,X2 G Va/" with d{xi,X2) = d, so they differ in 
exactly d elements. To go at distance d + 1 from xi we can select any of the 
remaining u — d elements and change it in f — 1 possible ways, so the total 
number is [u — d){v — 1) and depends only on d, not on xi,X2- Similarly, by 
changing one of the differing elements of X2 to match the value of Xi we get a 
vertex at distance d — 1, and there are d such elements. □ 

Proposition 37. The graph (VaP,~) is a VT~^ graph. 

Proof. Recall that we assume the values in the set Val to be indexed, i.e. 
Val = {vq, . . . , Vj, . . . , v„_i}, where v = | Val\. Note that, for convenience, we 
opt to use here the indexing from to v — 1. Let us define an bijective function 
p : Val — >■ Val as 

for every Vj G Val, and where represents the sum modulo v. We define the 
composition of p with itself i times as 

p'{vj) = po po ...o p{vj) 
i times 

Note that since p is injective, p* is injective as well. 

We represent a database in Val^ as x = v^.^ . . . vi^^ . . . Vfc^_-^, with < i < 
u — 1 and < ke < V — 1. We now define a family {cr^j^^Q ^ of automorphisms 
as follows. Given a < i < — 1, consider the representation in base v of l: 

i = io • -y" + ... + if •/ + ... + iu-i ■ v""'^ (5.3) 

where < ig < v — 1. Then define 

<7.{x) = p^«(vfcj . . . p'^ivk,) . . . p^"- (vfc„_ J (5.4) 

where x = Vk^ ■ ■ ■ . . . Vk^^^ . 
We have to show that: 

• is an automorphism for all < l < — 1. 

First we show that is injective. Let us consider two arbitrary databases 
x = ■ ■ -Vki, . . . Vk^_^ and x' = vy^ . . . v^/ . . . vy^_^, and assume a, = 
/9*o (•)... /9** (•).../>*"-! (•). li X ^ x' then v^^ / for some ^, and 
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since an arbitrary is injective we have p^'{vki) 7^ p^'{vy^. Therefore 
a,{x) / cr,(x'). 

Now we show that if x ~ x' then cTt(x) ~ (Tt(x'). Consider an ar- 
bitrary pair of adjacent databases x = v^g . . . v^^ . . . Vk^-i ^' — 
Vfcg . . . Vfc/ . . . Vfc^_j, where x and x' differ exactly for v^^ 7^ v^,/. We 

know that crt(x) = p*°(vfco) . . . P^'{vke) ■ ■ ■ P*"~^(^fc„_i) and we also know 
that a,{x') = /)*o(vfcJ.../)*«(yfc/)...p*— Therefore cr,{x) and 
o"t(x') can differ at most in p*^(vfc^) and p^'{vf,r^). Since p**^ is injective, 
we have p^'{vk^) 7^ /3**(vfcp, and it follows that ~ 

• For every x = v^q . . . v^^ . . . v^^ -^ in VaZ" we have [j^=Q^{(7L{x)} = Val^ . 
Take an arbitrary element x' = vu . . .vy . . . vy in Va/". Note that 
p'^"^{^kn) — ^fcm®n allO<m,n<u — 1. Therefore the automorphism 
cr = pfco©'=o(.) , , , pK'^f^i^.) . . . where 9 represents the sub- 

traction modulo satisfies <t{x) = x' . Since < /c^ Q /c^ < u — 1 we have 
that 0- = for t = (/c^efeo)-^;°+. . .+{k'(^Qki)V+. . .+{K-iQ^u-i)-v''-^ , 
and therefore a belongs to the family {o"t}^^Q ^. 

□ 

Figure 5.5 illustrates some examples of structures (VaZ",~). Note that 
when I Val\ = 2, ( Va/",~) is the n-dimensional hypercube. 
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(b) u — 3, Val — {a, b, c} (for read- 
ability sake we show only part of the 
graph) 



Figure 5.5: Some (Va/",~) graphs 



The relation between graph structures we consider in this chapter is sum- 
marized in Figure 5.6. We remark that in general the graphs ( Val^, ~) do not 
have a single-orbit automorphism. 



101 



5. Differential privacy: the trade-off between leakage and 



UTILITY 




Figure 5.6: Venn diagram for the classes of graphs considered in this section. 
Here S* = { Va^ \ \ Val\ = 2, u < 2} 

5.4 Deriving the relation between differential 

privacy and quantitative information flow on 
the basis of the graph structure 

In this section we present the main technical contribution of the chapter: a 
general technique that explores the graph structure induced by the adjacency 
relation ~ on A" and the query / to determine relations between e-differential 
privacy and min-entropy leakage, and between e-differential privacy and utility. 
We use the symmetries of the graph structure {X, ~) to transform the channel 
matrix into an equivalent matrix with certain regularities. These regularities 
are the key that allow us to establish the link between e-differential privacy 
and the a posteriori min-entropy (i.e. the conditional min-entropy associated 
to the channel) . The establishment of bounds on the a posteriori entropy will 
allow us to derive bounds on leakage and utility: in Section 5.5 we will cope 
with leakage and in Section 5.6 we will cope with utility. 

But first, in Section 5.4.2 we will present how to perform the transformation 
on the channel matrix, and in Section 5.4.3 we will show how to derive a bound 
on the a posteriori min-entropy for the matrix obtained. It is important to note 
that we consider the case where the channel input has the uniform distribution. 
This is not a restriction for our bounds on the leakage: as seen in Chapter 3, the 
maximum min-entropy leakage is achieved in the uniform input distribution 
and, therefore, any bound for the uniform input distribution is also a bound for 
all other input distributions. In the case of utility the assumption of uniform 
input distribution is more restrictive, but we will see that it still provides 
interesting results for several practical cases. 

Before we present formally our technique, let us fix some notation. 

5.4.1 Assumptions and notation 

In the rest of this section we consider channels (usually referred to by M, M', 
M" or N) with input A and output with finite carriers A = {oQ) • • • j o-n-i} 
and B = {6o, . . . , 6m-i}) respectively, and we assume that the probability 
distribution of A is uniform. Furthermore, we assume that \ A\ = n <\B\ = m. 
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If it is the case that n > m, we just add to the matrix enough zero-ed columns, 
i.e. columns containing only O's, so as to match the number of rows. Note 
that adding zero-ed columns does not change the min-entropy leakage nor 
the conditional min-entropy of the channel. We assume as well an adjacency 
relation ~ on A, i.e. that (A, ~) is an undirected graph structure. With a 
slight abuse of notation, we will also write i ~ /i when i and h are associated 
to adjacent elements of A, and we will write d(i, h) to denote the distance 
between the elements of A associated to i and h. More generally, we may use 
the number i to denote the element Oj of A (or, equivalently, the element bi of 
B) whenever it is clear from the context. 

We note that a channel matrix M satisfies e-differential privacy if for each 
column j and for each pair of rows i and h such that i ~ /i we have that: 



1 M 



< 1^ < 

The a posteriori entropy of a channel with matrix M will be denoted by 
{A\B), and its min-entropy leakage by {A; B). 

We denote by M[l — >■ k] the matrix obtained by "collapsing" the column I 
into k, i.e. 

'Mi,fc + Mi,i ifj = k, 
iij = l, 



M[l k\ 



Mij otherwise 



Given a partial function p : A ^ B, the image of A under p is p{A) = 
{p{a)\a € A, p{a) ^ _L}, where ± stands for "undefined". 

In the proofs we will need to use several indices, and we will typically use 
the letters z, j, h, k, I to range over rows and columns (usually i, h, I will range 
over rows and j, k will range over columns). Given a matrix M, we denote by 
max^ the maximum value of column j over all rows i, i.e. max^^ = maxj Mij, 
and by max*'^ = maxjj- Mij the maximum element of the matrix. 

Finally, given a graph G = (V, ~) with diameter 5, we denote by Aq the 
set {0, 1, . . . ,6}. We may omit the subscript and denote the set only by A if 
the context does not allow any confusion. The notation V(rf)(v) represents the 
subset of V of all elements w at distance d from v. For a fixed d, we define 

= \^{d)i'^)\ ^ the number of vertices in V at distance d from v, and we 
intend that it will be always clear by the context to which set of vertices V 
and element v the value is associated to. 



5.4.2 The matrix transformation 

The transformation on the channel matrices is divided into two steps, and we 
start this section by giving an overview of the process. Consider a channel 
whose matrix M has at least as many columns as rows and assume that the 
input distribution is uniform. First, we transform M into a matrix M' in which 
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each of the first n columns has a maximum in the diagonal, and the remaining 
columns are all O's. Second, under the assumption that the input domain is 
distance-regular or VT'^, we transform M' into a matrix M" whose diagonal 
elements are all the same, and coincide with the maximum element max^'^ 
of M". The transformation ensures that both M' and M" are valid channel 
matrices (i.e. each row is a probability distribution), also respect e-differential 
privacy, and preserve the value of the a posteriori entropy for the uniform 
input distribution. A scheme of the transformation is shown in Figure 5.7, 
where Lemma 38 {Step 1) is applied on the first step of the transformation, 
and on the second step either Lemma 39 {Step 2a) or Lemma 40 {Step 2b) is 
applied, depending on whether the graph structure is distance-regular or VT~^, 
respectively. 



M 



Mo.a Mo,i 

A/n-1,0 A/„,_l,l 



Lemma Step 1 
(any graph structure) 



M' 



Lemma Step 2a ,' 
(dist-reg) ' 



\ Lemma Step 26 

(VT+) 



M" 



Figure 5.7: Steps of the matrix transformation for distance-regular and VT^ 
graphs 



We now present formally the transformation. The next Lemma is relative 
to the first step. 

Lemma 38 (Step 1). Let M be a channel matrix of dimensions n x m with 
at least as many columns as rows, and assume that M satisfies e-differential 
privacy. Then it is possible to transform M into a matrix M' satisfying the 
following conditions: 

(i) M' is a valid channel matrix: Yl'j^=o ^^ij — ^ Z'^'" all < i < n — 1; 
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(a) Each of the first n columns has a maximum in the diagonal: = 
max^'^' for all < i < n — 1; 

(Hi) The m — n last columns contain only O's: M[ ■ = for all < i < n — 1 
and all n < j < m — 1; 

A'/' 

(iv) M' satisfies e- differential privacy: -ttt^ < e*^ for all < i, h < n — 1 s.t. 
i ^ h and all < j < m — 1; 

(v) H^'{A\B) = H^{A\B), if A has the uniform distribution. 

Proof. We first show that there exists a matrix N of dimensions n x m, and 
an injective total function p : A ^ B such that ^: 

• ^i,p{i) — i € and 

• Nij = for all j G B\p{A) and all i G A. 

We iteratively construct p and "column by column" via a sequence of 
approximating partial functions ps and matrices Ng (0 < s < m). 

• Initial step (s = 0) 

Define pQ{i) = _L for alH G ^ and Nq = M. 

• s*^ step {1 < s < m) 

Let j be the s-th column and let i G ^ be one of the rows containing 
the maximum value of column j in M, i.e. Mjj- = max^^. There are two 
cases: 

1. ps~i{i) = ^. We define: 

Ps = U {i 1-^ j} and 

Ns = Ns^i 

2. ps-i{i) = k G B. We "collapse" column j into column k (recall the 
notation introduced in Section 5.4.1): 

Ps = Ps^i and 
Ns = Ns.i[j^k] 

^To avoid a heavy notation, here we will use the convention established in Section 5.4.1 
and denote Na^^bj , where ai £ A and bj £ B, simply by Nij. 
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Since the operation of "collapsing" assigns j in ps and then zeroes the 
column J in Ns, all unassigned columns B \ Pm{-^) niust be zero in N^- We 
finish the construction by taking p to be the same as p^yi after assigning to 
each unassigned row one of the columns in B\ pm (-4) (there are enough such 
columns since n < m). We also take N = N^- Note that by construction N 
is a channel matrix. 

Thus we get a matrix N and a function p : A B which, by construction, 
is injective and satisfies Ni p(^i-^ = max^^^ for all i G ^, and Ni^j = for all 
j € B\p{A) and all i ^ A. Furthermore, N provides e-differential privacy 
(condition (iv)) because each column is a linear combination of columns of M. 
It is also easy to see that max^'^ = max^, and from that it immediately 
follows that H^{A\B) = {A\B) (recall that A has the uniform distribution 
and therefore the a posteriori entropy is a function of the sum of the maximum 
of each column), so condition (v) is satisfied. 

Finally, we create our claimed matrix M' from just by rearranging the 
columns according to p. Note that the order of the columns is irrelevant, since 
any permutation represents the same conditional probabilities and therefore 
the same channel The resulting matrix M' has all maxima in the diagonal 
for < i < n — 1, and every element in the columns n < j < m — 1 are 0, 
which satisfies conditions (ii) and (iii). Also, since is a valid channel matrix, 
so is M' and condition (i) is also satisfied. 

□ 

The second step of the transformation depends on the graph structure of 
(A,^). But before we discuss this step, let us introduce a notion of distance 
between elements in B, derived from the notion of distance between elements 
in A. Let M be a channel matrix in which the maximum of each column is 
in the diagonal, as in Figure 5.8. Then we define the distance between two 
elements ji,j2 £ B as follows: 



{d{ii,i2) if there are ii,i2 G A such that ii = ji and i2 = j2, 
_L otherwise. 

(5.5) 

Note that the range of the notion of distance defined above is the set 
A = {0, 1, . . . , 6}, where 5 is the diameter of {A, ~). Based on (5.5), we define 
the set B(^d){j) ^ the subset of B of elements at distance d from an element 
j G B. It is clear that for any j E B^ we have U^ga ^{d)U) — ^■ 

We can extend the adjacency relation ~ on ^ to an adjacency relation ~' 
on B by using the notion of distance of (5.5). For any Ji,j2 £ B, we have 



*Note that by rearranging the columns of the channel matrix we may change the marginal 
probability of the outputs. This, however, does not pose a problem for our purposes, since 
the maximum a posteriori entropy of the channel will be maintained. If we want the marginal 
probability of the outputs to remain unchanged, we can just "relabel" the columns after the 
rearrangement so they will match the correct outputs. 
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Mn n A /n 1 ... 
Mi,o ... 




Ivln m 9 Ivln ■m 1 

U 1 (It — A \J , III — X 


M,,o • • • Mi^y 




" ■ ■ ■ Mi^rn-l 


d{i,j') { : 


■ N 


> d(i.j") 




Mr, J" = 




M„_2,0 . . . 
Mn-1,Q Mn-1,1 ■ ■ . 


■ ■ ■ Mr,- 


Mn-2,m-l 
-l,m-2 ^^11-1,771-1 



Figure 5.8: The relation between elements of a row i and the elements in the 
diagonal 



ji ~' J2 if and only if d{ji,j2) = 1. Therefore, if {A, ~) is distance-regular, so 

it is {B,r^'). 

Now we are ready to present the lemma for the second step of the trans- 
formation, in the case of distance-regular graphs. 

Lemma 39 (Step 2a). Let M' be a channel matrix of dimensions n x m with 
at least as many columns as rows, and assume that M' satisfies e- differential 
privacy. Let ^ he an adjacency relation on A such that the graph [A^ ^) 
is connected and distance-regular. Assume that the maximum value of each 
column is on the diagonal, that is Mi^i = max^ for all i G A, and that all the 
last m — n columns have only zero elements, i.e. M-j = for allO < i < n — 1 
and n < j < m — 1. Then it is possible to transform M' into a matrix M" 
satisfying the following conditions: 

(i) M" is a valid channel matrix: Y1Y=0 ~ ^ f^''^ all < i < n — 1; 

(a) The elements of the diagonal are all the same, and are equal to the max- 
imum of the matrix: M-'- = max^^ for all < i < n — 1; 

(Hi) The m — n last columns contain only O's: M'-' ■ = for all < i < n — 1 
and all n < j < m — 1; 

M' 

(iv) M" satisfies e- differential privacy: j-jt^ < e"^ for all < i,h < n — 1 s.t. 
i ^ h and all < j < m — 1; 

(v) H^"{A\B) = H^'{A\B), if A has the uniform distribution. 
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Proof. Let us define B* = {0, 1, . . . , n — 1}, i.e. tlie subset of B tliat excludes 
tlie zero-ed columns of M' from n to m — 1. Note that we can safely use the 
set B* instead of B in this proof because the zero-ed columns do not contribute 
to the a posteriori entropy, and trivially respect e-differential privacy. 
We then define the matrix M" as follows. 



otherwise. 



By the definition above, condition (iii) is immediately satisfied. We then 
show that this definition also induces a channel matrix. We have 

E<. = E ,|. E E 

jeB' jGB' '^\'^{d{^,J))[^)\ keB' heA^,^,,,^^{k) 

Recall that A = {0, ... ,6}, where 5 is the diameter of the graph. Note that 
for every i, B* = IJdeA different values of d the sets B*^-^{i) are 

disjoint. Therefore the summation over j € B* can be split as follows 

= n E E E \A,.(i)\ E 
= ^ E E E Kk E 

kciB*deAheA^a){k) jeB*^)W ^' 



as 



E— -— 7 = 1, we obtain 
\ A, iW 



and now the summations over h can be joined together 

= ^EE<^. 

k£B* heA 

= 1 

which implies that condition (i) is satisfied. 

We now turn our attention to the elements of the diagonal. We have 

<^ = ^ E K>r 



n 
heA 
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and so they are all identical. To fulfill condition (ii) we still need to show that 
M'l^ = maxf" for all i £ A. 

— ~~\~A TTl ^^'h,h (since the biggest element 

^\-^{dii,j))W \ i^^i^, ' is in the diagonal) 

= — M'f^ f^ ■ 1 (since the graph 

fees* is distance-regular) 

= M"- 

Since A has the uniform distribution, H^' (A\B) = H^" {A\B) (condition 
(v)) follows immediately. 

It remains to show that M" satisfies e-differential privacy (condition (iv)). 
We need to show that 

Ml'j < e'Mlj yj £ B,i,i' £ A:i i' 

From the triangular inequality we have (since d{i,i') = 1) 

d{i/j)-l<d{i,j) <d{i',j) + l 

Thus, there are 3 possible cases: 

1. d{i,j)=d{i',j) 

The result is immediate since M/'- = M// ■. 

2. d{i,3)=d(i\3)-l 

We define the set of neighbors of h "one step further away" from k: 
^h,k = {h' h\h' £ Ai^d{h,k)+i){k)} 

Note that \J-h,k\ = ^d{h,k) since the graph is distance-regular. The fol- 
lowing inequalities hold for any h,h' £ A: 

Mj^^^ < e'M'^,^^ W £ Th,k (diff. privacy) 

bd{h,k)Mh k < E ^^h',k (sum of the above) 
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we now fix a distance d and sum the above inequalities for all vertices at 
distance d from h: 

Note that each h' € A(^d+i){k) is contained in J-h^t foi' exactly Cd+i 
different h € Ai^d){^)- So the right-hand side above sums all vertices of 
■^{d+i) (^) exactly Cd+i times each. Thus we get that for all A; G jB*, d G A: 

hd K,k<e^Cd+i Yl Kk (5-6) 

Finally, note that Cd+i|^(d+i)(2)| = bd\A(^d){'i')\ (both sides count the 
number of edges between a vertex at distance d and a vertex at distance 
d+1). So we have 

3. d(i,i) = d(i',i) + l 

This case is analogous to the case case where d(i,j) = d{i',j) — 1. 

□ 

The next lemma is relative to the second step of the transformation, for 
the case of VT~^ graphs. 

Lemma 40 (Step 26). Consider a channel matrix M' satisfying the assump- 
tions of Lemma 39, except for the assumption about distance-regularity, which 
we replace by the assumption that {A, ^) is VT^ . Then it is possible to trans- 
form M' into a matrix M" with the same properties as in Lemma 39. 

Proof. Let us define B* = {0, 1, . . . , n — 1}, i.e. the subset of B that excludes 
the zero-ed columns of M' from n to m — 1. Note that we can safely use the 
set B* instead of B in this proof because the zero-ed columns do not contribute 
to the a posteriori entropy, and trivially respect e-differential privacy. 
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We then define the matrix M" as follows. 



otherwise. 



By the definition above, condition (iii) is immediately satisfied. We then 
show that this definition also induces a channel matrix. Recall that {(7/i(j)|0 < 
h <n — 1} = A since the graph is VT^ . 



n—l n—1 ^ n—1 

j=0 j=0 h=0 

n—1 ^ n—1 

h=Q j=0 
n—1 ^ 

= — • 1 (since ah is a permutation) 

= 1 



which implies that condition (i) is satisfied. 

Now we prove that the diagonal contains the maximum values of the matrix 
(condition (ii)), i.e. for every i, M-'- = max^^ . It is easy to see that, by 
definition, the elements of the diagonal are all the same (they are the average 
of the diagonal elements of M'). Then we need to show that they are the 
maximum of each column, from which it follows that they are the maximum 
of the matrix. 

^ n—1 
h=0 
^ n—1 

>-y^M',,^ ^-s (since M' ,,^ = max*^' .,) 

h=0 
= M"- 



We now prove that M" provides e-differential privacy (condition (iv)). For 
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every pair i ^ i' and every j: 

^ n— 1 



n — ' 

/i=0 



^^^''h{i'),<rh{j) privacy, for some i' 

h=o s.t. ah{i') = CThU)) 



Finally, we prove condition (v): 

n-l 

.,h 



^ n—1 



n 
i=0 



^ n—l ^ n—1 

i=0 h=0 
^ n—l 

- y H^' (A\B) (since M' „ = max^^' ) 

i=0 
H^'{A\B) 



□ 

5.4.3 The bound on the a posteriori entropy of the channel 

Once the transformation presented in the previous section has been applied, 
and the channel matrix respects the properties of M", we can use again the 
graph structure of [A, ~) to determine a bound on the a posteriori entropy 
H^"{A\B) of M". Recall that our matrix transformation preserves the value 
of the a posteriori conditional entropy, so the bound we find is also valid for 
the original channel matrix we started with. 

It is a known result in literature (cfr. [BCP09]) that, if the distribution 
on A is uniform, then the a posteriori entropy of the channel M is given by 

1 

n 



H^iA\B) = -log,-Y,^s.xf 



Hence, under our assumption that the input distribution A is uniform, 
and knowing that matrix the M" the diagonal elements are all equal to the 
maximum max^"^ , we have 

H^"{A\B) = -log,max''" (5.7) 
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Therefore to find a bound on the a posteriori entropy of the channel M" 
it is enough to find a bound on max*''^ . This is exactly what we do in this 
section. 

We proceed by noting that the property of e-differential privacy induces a 
relation between the ratio of elements at any distance: 

Remark 41. Let M he a matrix satisfying e-differential privacy. Then, for 
any column j, and any pair of rows i and h we have that: 

1 < Mm. < e^rf{*,/>) 



In particular, as we know that the diagonal elements of M are equal to the 
maximum element max*^, then for each element Mij we have that: 

M 

M, , > (5.8) 
which motivates the next proposition. 

Proposition 42. Let M he a channel matrix satisfying e-differential privacy 
where the diagonal elements are the maximum element max^'^ of the matrix. 
Then: 

max^^ < ^ 



where A = {0, 1, . . . , (5}, 5 is the diameter of the graph ~), and = 
•^(d)ij) the numher of elements Mij that are at distance d from the corre- 
sponding diagonal element Mjj, i.e. such that d{i,j) = d. 

Proof. The elements of any given row i of M represent a probability distribu- 
tion, therefore they sum to 1. 

j 

By substituting (5.8) in the equation above we obtain: 

V ( '^^] < 1 

y Qid{i,j) j — 

j 

and therefore 



max^ < 



E nd 
d e'^'^ 



□ 
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Putting together all the steps of this section, we obtain our main result. 

Theorem 43. Consider a channel matrix M satisfying e-differential privacy 
for some e > 0, and assume that (A,^) is either distance-regular or VT~^ . 
Then we have: 

H^{A\B) >- log, (5.9) 

where = \ A(^d'^{i)\ is the number of nodes j & A at distance d from i ^ A. 

Moreover, this hound it tight, in the sense that we can build a matrix for 
which (5.9) holds with equality. 

Proof. The inequality follows directly from (5.7) and Proposition 42. To prove 
that the bound is tight, it is sufficient to define each element Mij according 
to (5.8) with equality instead of inequality. □ 

In the next sections we will see how to use this theorem for establishing a 
bound on the leakage and on the utility. 



5.5 Application to leakage 

As discussed in the Section 5.2, the correlation C{X,Z) between X and Z 
measures the information that the attacker can learn about the database by 
observing the reported answers. In this section we consider the min-entropy 
leakage as a measure of this information, that is JC{X, Z) = Ioo{X; Z). We then 
investigate bounds on information leakage imposed by differential privacy. 

Before we continue, let us make a very important observation about the 
results we obtain in this section. 

Remark 44. The bounds on the min-entropy leakage we present in this section 
(Theorem 45, Proposition 48, and Proposition 49) are derived under the as- 
sumption that the input distribution X for the channel is uniform. As seen in 
Chapter 3, we know from the literature [BCP09, SmiOQ] that the min-entropy 
leakage {X; Z) of a given matrix M is maximum when input distribution 
is uniform (even though it may not be the only case). Therefore the bounds 
we present in this section, although based on the assumption that X has the 
uniform distribution, are valid for every possible input distribution. As we 
model side information as input distributions, and as we provide bounds on 
the leakage for any possible input distribution, it follows that our bounds on 
the min-entropy leakage are valid for any possible side information the attacker 
may have. 

Our first result shows that the min-entropy leakage of a randomized func- 
tion /C is bounded by a quantity depending on e, and on the numbers u = \Ind\ 
and V = I Val\ of individuals and values respectively. We assume that v >2. 
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As seen in Section 5.2, /C can be modeled as a channel with input X and 
output Z. From Propositions 36 and 37 we know that {X, r^) is both distance- 
regular and VT~^, and therefore we can apply Theorem 43. Then, by (5.8) we 
know that for j € ^^^^(x) (i.e. every j in X at distance d from a given x) it 

is the case that M^j > "^^fa' ■ Furthermore we note that each element j at 
distance d from x can be obtained by changing the value of d individuals in 
the li-tuple representing i. We can choose those d individuals in (^) possible 
ways, and for each of these individuals we can change the value (with respect 
to the one in x) in v — 1 possible ways. Therefore |<Y^(i^(x)| = {^{v — 1)"^, and 
we obtain that the number of databases at distance d from x is 

n,= |^^,)(rr)|=(;^) {v - if (5.10) 

In fact, recall that x can be represented as a n-tuple with values in V. We 
need to select d individuals in the u-tuple and then change their values, and 
each of them can be changed in v — 1 different ways. 

Using the value of from (5.10) in Theorem 43 we obtain the following 
result. 

Theorem 45. If IC satisfies e-differential privacy, then the information leakage 
is bound from above as follows: 

V 6^ 

Ioo{X; Z) <u log2 — — - = Bnd{u,v,€) 

V — 1 + 

Proof. For this proof we need a matrix with all column maxima on the di- 
agonal, and all equal. We obtain such a matrix by transforming the matrix 
associated to /C as follows: first we apply Lemma 38 to it (with A = X and 
B = Z), and then we apply either Lemma 39 or Lemma 40 (we can choose 
either of them, since {X, ~) is both distance-regular and VT'^). The final ma- 
trix M has all non-zero elements on its n x n submatrix, with n = \X\ = Val^, 
provides e-differential privacy, and for every row i we have that Mj^j = max*^. 
Furthermore, {X; Z) is equal to the min-entropy leakage of /C, assuming a 
uniform distribution on X. 
Then we can derive: 

max 



3=1 d=0 ' 



d=0 ^ ^ 



M 

max 



(by (5.10)) 



Since each row represents a probability distribution, the elements of row i 
must sum up to 1: 



d=0 ^ ' 
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and by multiplying both sides of the inequality by e*^" we get 
max^'^ Ed=o id) - l)'^e<"-'^) < 



Since by the binomial expansion 



{v-l + eT, 



d=0 

we obtain: 



-ax^ < (5-11) 

Therefore: 

{X; Y) = H^{X) - H^{X\Y) (by definition) 

= log2 VaZ" + log2 max^ (by (5.7)) 

^— T (by (5.11)) 

u — 1 + / 

, V 

= ulog2 — — 

V — 1 + 

To conclude our proof we recall that, since the above bound on {X; Y) 
is valid for the case where X has the uniform distribution, it is also valid for 
any distribution on X. 

□ 

Note that the bound Bnd{u, v,e) = u log2 (^^^ij^^e^ is a continuous function 
in e, has value when e = 0, and converges to u log2 u as e approaches infinity. 
Figure 5.9 shows the growth of Bnd{u,v,e) along with e, for various fixed 
values of u and v. 




Figure 5.9: Graphs of Bnd{u,v,e) for ti=100 and v = 2 (lowest line), v = W 
(intermediate line), and t;=100 (highest line), respectively. 

The next proposition shows that the bound obtained in previous theorem 
is tight. 
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Proposition 46. For every u, v, and e there exists a randomized function /C 
which provides e-differential privacy and whose min-entropy leakage, for the 
uniform input distribution, is Iqq(X;Z) = Bnd{u,v,€). 

Proof. The adjacency relation in X determines a graph structure Gx- Set 
Z = X and define the matrix of K as follows: 

P.(^W = ^^=f^ (5.12) 

where d is the distance between x and z in Gx- 

We need to show that p/c('|2:) is a probability distribution for every x: 



Bnd{u,v,e) \Sr 



Bnd{u, V, e) 



d 

rid 



e\d 



= Bnd{u,v,e) rj by Proposition 42 

max 

= Bnd(u,v,e)— — r take d = in (5.12) 

^ ' ' ' Bnd{u,v,e) ^ ' 

= 1 

To see that /C provides e-differential privacy, just take d = 1 in (5.12), and 
to see that Ioo{X] Z) = Bnd{u, v, e) take d = in the same equation. 

□ 

We now give an example of the use of Bnd{u,v,e) as a bound for the 
min-entropy leakage. 

Example 7. Assume that we are interested in the eye color of a certain pop- 
ulation Ind = {Alice, Bob}. Let Val = {a,b,c} where a stands for absent 
(i.e. the null value), b stands for blue, and c stands for coalblack. We can 
represent each dataset as a tuple dgdi, where do G Val represents the eye 
color of Alice (cases dQ = b and do = c), or that Alice is not in the dataset 
(case do = a), di provides the same kind of information for Bob. Note that 
V = 3. Fig 5.10(a) represents the set X of all possible datasets and its adja- 
cency relation. Fig 5.10(b) represents the matrix with input X which provides 
e-differential privacy and has the highest min-entropy leakage. In the repre- 
sentation of the matrix, the generic entry a stands for , where max^^ is 
the highest value in the matrix, i.e. max 



M _ ei_ 



" {v-l+e^) (2+e^)- 

Note that the bound Bnd{u,v,€) is guaranteed to be reached with the 
uniform input distribution. The construction of the matrix for Proposition 46 
gives a square matrix of dimension Val^ x Va/". Often, however, the range 
of IC is fixed, as it is usually related to the possible answers to the query /. 
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Figure 5.10: Universe and highest 
differential privacy for Example 7. 
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(b) The representation of 
the matrix 



L-entropy leakage matrix giving e- 



Hence it is natural to consider the scenario in which we are given a number 
r < Va/", and want to consider only those /C's whose range has cardinality 
at most r. Proposition 48 shows that in n this restricted setting we can find 
a better bound than the one given by Theorem 45. But first we need the 
following lemma. 

Lemma 47. Let IC be a randomized function with input X, where X = Val^, 
providing e- differential privacy. Assume that r = \Range{K.)\ = v^, for some 
i < u. Let M be the matrix associated to IC. Then it is possible to build a 
square matrix M' of size x , with row and column indices in AQ X , and 
a binary relation ~'C A such that {A., ~') is isomorphic to ( Val^ , ^i), and 
such that: 

(i) M' is a valid channel matrix: Yl^=o j ~ ^ all < i < n — 1; 

(ii) Mlj < (e')"-'+'^M^^j. for all i,h e X and j G y, where d is the 
distance between i and h; 

(Hi) The elements of the diagonal are all equal to the maximum element of 
the matrix: M[- = max*^ for all i G X; 

(iv) H^' {X\Y) = H^{X\Y), if X has the uniform distribution. 

Proof. We first apply a procedure similar to that of Lemma 38 to construct a 
square matrix of size x which has the maximum values of each column 
in the diagonal. (In this case we construct an injection from the columns to 
rows containing their maximum value, and we eliminate the rows that at the 
end are not associated to any column.) Then define ~' as the projection of ~u 
on Vaf. It is easy to see that condition (ii) in is satisfied by this definition of 
~'. Finally, apply the procedure in Lemma 39, or equivalently the procedure 
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in Lemma 40, on the structure {X, ~') to make all elements in the diagonal 
equal to the maximum element of the matrix (condition (iii)). Note that this 
procedure preserves the property of condition (ii), and conditional min-entropy 
((iv)). Also the matrix obtained is a valid channel matrix (condition (i)). □ 

Now we are ready to prove the proposition. 

Proposition 48. Let fC he a randomized function with associated channel 
matrix M, and let r = \Range{lC)\. If fC provides e-differential privacy then 
the min-entropy leakage associated to K, is hounded from ahove as follows: 



I^{X;Z) < log. 



{v -1 + e^y - {e^Y + (e«)« 
where I = [log„ rj . 

Proof. Assume first that r is of the form v^. We transform the matrix M 
associated to IC by applying Lemma 47, and let M' be the resulting matrix. 
Let us denote by max*''^ the value of every element in the diagonal of M', i.e. 
max^ = for every row i, and let us denote by ^'^^^(i) the set of elements 
whose ~'-distance from i is d. Note that for every j € A'(^^-^{i) we have that 
M^ j < Mlj{e'Y-^+'^, hence 

, max-^ 

m;, > 



Furthermore each element j at ~'-distance d from i can be obtained by 
changing the value of d individuals in the £-tuple representing i (remember 
that (^, ~') is isomorphic to (Va/^,~^)). We can choose those d individuals 
in (^) possible ways, and for each of these individuals we can change the value 
(with respect to the one in i) in u — 1 possible ways. Therefore 



Taking into account that for we do not need to divide by (e*^)" 
we obtain: 

max^^ + ELi (v - If^^, < M(,. 

Since each row represents a probability distribution, the elements of row i 
must sum up to 1. Hence: 

max^^ + ELi 0) (v - ir^f^ < 1 (5.13) 

By performing some simple calculations, similar to those of the proof of 
Theorem 45, we obtain: 

M ^ (eT 

max ^ („_i+e^)«_(eE)«+(eE)" 
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Therefore: 

I^'iX; Z) = H^{X) - H^\X\Z) (by definition) (5.14) 

= log2 v"" + log2 max^^^ (5.15) 

= log2 v"" + log2 ^ + log2(i;^ max^^) (5.16) 

< log. ^,_,^^y%y^^,.y (5-13) ) (5.17) 

Consider now the case in which r is not of the form v^. Let I be the 
maximum integer such that < r, and let m = r — u^. We transform the 
matrix M associated to IC by collapsing the m columns with the smallest 
maxima into the m columns with highest maxima. Namely, let Ji, j2; ■ ■ ■ ,jm 
the indices of the columns which have smallest maxima values, i.e. max^^ < 
max^ for every column j ^ j\,j2, ■ ■ ■ , jm ■ Similarly, let ki,k2, ■ ■ ■ ,km be the 
indexes of the columns which have maxima values. Then, define 

N = M[ji ^ ki] [j2 ^k2]... [jrn ^ Ka] 

Finally, eliminate the m zero-ed columns to obtain a matrix with exactly 
columns. It is easy to show that 

C{x-z) < C{x-z)^ 

V 

After transforming into a matrix M' with the same min-entropy leakage 
as described in the first part of this proof, from (5.14) we conclude 

I^iX;Z)<I^'{X;Z)^<log, 



□ 

Note that this bound can be much smaller than the one provided by The- 
orem 45. For instance, if r = f this bound becomes: 

logs 



V-1 + (e^)" 



which for large values of u is much smaller than Bnd{u,v,e). 

Let us clarify that there is no contradiction with the fact that the bound 
Bnd{u,v,€) is strict: in fact it is strict when we are free to choose the range, 
but here we fix the dimension of the range. 
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5.5.1 Measuring the leakage about an individual 

As discussed in Section 5.2, the main goal of differential privacy is not to 
protect information about the complete database, but about each of its indi- 
vidual participants. To capture the leakage about a particular individual, we 
start from a tuple x~ € Val^~^ containing the given (and known) values of all 
other u — 1 individuals. Then we create a channel whose input V ranges over 
the values in Val and represents the value of our individual of interest. Note 
that this means that we take into consideration all possible input databases 
where the values of the other individuals are exactly those of x~ and only 
the value of the selected individual varies. Intuitively, {V; Z) measures the 
leakage about the individual's value where all other values are known to be as 
in x~ . (Similarly, represents the conditional entropy of V given Z 

for a fixed database where all other values are x~ .) As all these databases are 
adjacent, differential privacy provides a stronger bound for this leakage. 

Therefore, the leakage for a single individual can be characterized as fol- 
lows. 

Proposition 49. Assume that JC satisfies e- differential privacy. Then the 
information leakage for an individual is hound from above by: 



Proof. Let us fix a database x, and a particular individual i in Ind. The 
possible ways in which we can change the value of z in x are v — 1. All the 
new databases obtained in this way are adjacent to each other, i.e. the graph 
structure associated to the input is a clique of v nodes. Recall that is the 
number of elements of the input at distance d from a given element x. In this 
case we have 




V e'^ 



(1 



for d = 0, 
for d = 1, 
otherwise. 



= < 



V - 1 



By substituting this value of in Theorem 43, we get 



H^^{V\Z)>-log, 



1 



V - 1 



1 + 



log2 
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The particular individual can present v different values, and thus in the 
case the input distribution is uniform its min-entropy is (V) = log2 v. 

{V; Z) = (V) - {V\Y) (by definition) 

= log2 V + log2 ^— j- — - (by the derivations above) 

= log2 r— ^ 

V — 1 + 



Since the min-entropy leakage is maximum in the case of the uniform input 
distribution, the result follows. 

□ 

Note that the bound on the leakage for an individual does not depend on 
the size u of Ind, nor on the database x~ that we fix. 



5.6 Application to utility 

As discussed in Section 5.2, the utility of a randomized function fC is the 
correlation between the real answers Y for a query and the reported answers 
Z. 

For our analysis we assume an oblivious randomization mechanism. As 
discussed in Section 5.2, in this case the system can be decomposed into the 
cascade of two channels, and the utility becomes a property of the channel 
associated to the randomization mechanism H which maps the real answer 
y € y into a reported answer z & Z according to given probability distributions 
Pz\Yi'\')- The user, however, does not necessarily take z as her guess for the 
real answer, since she can use some Bayesian post-processing to maximize the 
probability of success, i.e. a right guess. Thus for each reported answer z the 
user can remap her guess to a value y' € y according to some strategy that 
maximizes her expected gain. 

The standard way to define utility is by means of gain functions (see for 
instance [BS94]). We define gain : y x y ^ M and the value gain{y,y') 
represents the reward for guessing the answer y' when the correct answer is y. 

It is natural to define the global utility of the mechanism H as the expected 
gain: 

U{Y,Z) = Y,p{y)Y,p{y'\y)gain{y,y') (5.18) 

y y' 

where p{y) is the prior probability of real answer y, and p{y'\y) is the proba- 
bility of the user guessing y' when the real answer is y. 
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Assuming that the user uses a remapping function guess : Z ^ we can 
derive the following characterization of the utility. Recall that 5x{-) represents 
the probability distribution which has value 1 on x and elsewhere. 

U{Y,Z) = Y,p{y)Y,v{y'\y)gain{y,y') (by (5.18)) 

y y' 

= ^Piy)^ i^Piz\y)piy'\z)\ gain{y,y') 
y y' \ ^ / 

= ^^Piy)"^ I '^P{z\y)Sy' {guess (z)) \ gain{y,y') {y' = guess{z)) 
y y' \ z J 

= X] X] P^^\y^ X] V i9uess{z))gain{y, y') 

y Z y' 

= "^Piy, z) X 6y> {guess{z))gain{y, y') 
y,z y' 

= "^Piy, z)gain{y, guess{z)) (5.19) 
y,z 

We focus here on the so-called binary gain function, which is defined as 

1 if y = y', 

otherwise. 



gainbiniyiy') 



Note that in the above equation the value y' represents the user's guess 
after the observed answer z. Therefore we have 

gaini^in = 6y{guess{z)) 

This kind of function represents the case in which there is no reason to pre- 
fer one answer over another, except if it is the correct answer. More precisely, 
we obtain some gain if and only if we guess the right answer. Note that if the 
answer domain is equipped with a notion of distance (i.e. even if two answers 
are wrong, one of them may be "closer" to the correct one than the other) then 
the gain function could take into account the proximity of the reported answer 
to the real one. In this case a "close" answer, even if wrong, is considered 
better than a distant one. We do not assume here a notion of distance, and 
therefore we will focus on the binary case. The use of binary gain functions in 
the context of differential privacy was also investigated in [GRS09]^. 

By substituting gain with i^am^j^ in (5.19) we obtain: 

UiY,Z) = Y,P{y,z)6y{guess{z)) (5.20) 

y,z 

^The authors of [GRS09] used the dual notion of loss functions instead of gain functions, 
but the final result is equivalent. 
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which tells us that the expected utility is the greatest when guess (z) = y is 
chosen to maximize p{y, z). Assuming that the user chooses such a maximizing 
remapping, we have: 

Z^(y,Z) = Vmaxp(y,z) 

y 

z 

= y max(p(y) pfzly)) (by the Bayes law) (5-21) 

/-^ y 

z 

If the gain function is binary, and the function guess is chosen to optimize 
utility (i.e. it represents the user's best strategy), then there is a well-known 
correspondence between l/( and the Bayes risk / the a posteriori min-entropy. 
This correspondence is expressed by the following proposition: 

Proposition 50. Assume that function gain is binary and the function guess 
is optimal. Then: 

U{Y,Z) = y2max{p{y)p{z\y)) = 2"^-(^l^) 
^-^ y 

z 

Proof. Just substitute (5.21) in the definition of conditional min-entropy: H^[Z 

Y) = -\og2Y.z™-^'^yi{piv)p{Ay))- n 

5.6.1 The bound on the utility 

In this section we show that, in some special cases, the fact that /C provides 
e-differential privacy induces a bound on the utility as defined in terms of a 
binary gain function. We start by extending the adjacency relation ~ from 
the datasets X to the real answers 3^, in such a way that two values in y are 
adjacent if they have pre-images that are adjacent. Intuitively, the function / 
associated to the query determines a partition on the set of all databases {X^ 
i.e. VaZ"), and we say that two classes are adjacent if they contain an adjacent 
pair. More formally: 

Definition 51. Given y, y' £ y, with y ^ y' , we say that y and y' are adjacent 
(notation y ^ y'), if and only if there exist x,x' E Val^ with x ^ x' such that 
y = f{x) and y' = f{x'). 

Since ~ is symmetric on databases, it is also symmetric on 3^, therefore 
also [y, ~) forms an undirected graph. 

Using the above concept of neighborhood for the inputs of the random- 
ization mechanism 'H, we can show that in an oblivious mechanisms (see Fig- 
ure 5.2) if the query / is deterministic, then the randomized function K, pro- 
vides e-differential privacy with respect to neighbor databases if and only if 
H respects e-differential privacy with respect to neighbor answers. Intuitively, 
this result follows from the fact that a deterministic query / remaps every 
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database x € A:" to a sole answer y y, working as a sort of "relabeling" 
that substitutes databases for answers in the adjacency graph structure, and 
therefore preserving e-differential privacy. Note also that if /C is oblivious, the 
probability of any reported answer z ^ Z does not depend on the database, but 
solely on the real answer y. Therefore under a deterministic /, two databases 
X and x' can be mapped to same value of y only if, for all /C(2;|x) = K,i^z\x''). 

Proposition 52. If the query function f is deterministic, then the randomized 
function /C satisfies e-differential privacy with respect to every pair of neighbor 
databases x,x' € X if and only if the randomization mechanism % satisfies 
e-differential privacy with respect to every pair of neighbor answers y, y' G y. 

Proof. Since the matrix /C can be obtained by the product of the two matrices 
corresponding to / and Ti, we can derive that, for every pair of neighbor 
databases x and x' and for all reported answer z: 



lC{z\x) _ Pr[Z = z\X 



IC{z\x') Pr[Z = z\X 



y\X = x]Pr[Z = z\Y = y] 



Pr[Y = y\X = x']Pr[Z = z\Y = y] 

EySmiy)Pr[z = z\Y = y] 

Ey^f{.')iy)Pr[Z = z\Y = y] 
Pr[Z = z\Y = f{x)] 



Pr[Z = z\Y 

nz\f{x)] 

n{z\f{x')] 



fi^')] 



(matrix multiplication) 
(since / is deterministic) 
(applying the Dirac 5) 



Therefore it follows immediately that < if and only if ^ 

□ 

The link the above proposition establishes between the randomized func- 
tion /C and the randomization mechanism Ti will help us find determine a 
bound on the utility of Ti, since, in the case the query / is deterministic, 
requiring /C to respect e-differential privacy is equivalent to requiring that Ti 
does. 

Theorem 53. Consider a randomized mechanism %, and let y be an element 
of y. Assume that the distribution of Y is uniform and that (3^, ~) is either 
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distance-regular or VT^ and that Ti satisfies e- differential privacy. For each 
distance d G {0, 1, . . . , 5}, where 5 is the diameter of (3^, ~), we have that: 

UiY,Z)<-^ (5.22) 

d ^ 

where Ud is the number of nodes y' & y at distance d from y. 

Proof. Since (3^, ~) is distance-regular or VT~^ ^ we can apply Theorem 43 to 
derive that {Z\Y) > — log2 „ . Then we just substitute this result in 

Z^d ge d 

Proposition 50. □ 



The above bound is tight, in the sense that (provided (3^, ~) is distance- 
regular or VT'^) we can construct a mechanism H which satisfies (5.22) with 
equality. More precisely, for < z < n — 1 and < j < n — 1, we define H 
(here identified with its channel matrix for simplicity) as follows: 



where 



7 = (5.24) 



d 



Note that 7^ is a square matrix of dimension nxn, where n = \X\. This is 
not a problem because since we assume (3^, ~) to be either distance-regular or 
VT~^, via Theorem 43 we can transform the channel matrix into an equivalent 
one such that all non zero elements are in the submatrix of dimensions nxn. 
Let us introduce now Z* = {0, 1, . . . , n — 1}, i.e. the subset of Z that excludes 
the zero-ed columns of the channel matrix from n to m — 1. Note that for the 
following result we can safely use the set Z* instead of Z because the zero-ed 
columns do not contribute to the a posteriori entropy, and trivially respect 
e-differential privacy. 

Theorem 54. Assume (y, ~) is distance-regular or VT^ and that the dis- 
tribution of Y is uniform. Then the matrix 7i defined in (5.23) satisfies e- 
differential privacy and has maximal utility: 

U{Y, Z) = —3^ 
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Proof. First we prove that the matrix as defined in (5.23) is a channel matrix, 
i.e. that each row is a probability distribution. 



7 



^€d{i,j) 
1 



7 



oed{i,j) 



by (5.24) 



7 



1 



Now we show that the utility is maximum. 

UiY,Z)= V max{p{y)n{z\y)) 
zez* 



yZ max |^-H(2:|y) 



zeZ* " 

-y 

\y\ zk 



max ■ 



7 



y maxrf e 



.ed{i,j) 



zez* 



\y\ 
\y\ ' 

7 



by (5.21) 
since Y is uniform 

by (5.23) 
maximum is d = 



since \y\ = \Z* 



n 



□ 



Therefore we can always define as in (5.23): the matrix so defined will 
be a legal channel matrix, and it will satisfy e-differential privacy. If (3^, ~) is 
neither distance-regular nor VT^, then the utility of such Ti is not necessarily 
optimal. 

The conditions for the construction of the optimal matrix are strong, but 
there are some interesting scenarios in which they are satisfied. Depending on 
the degree of connectivity c of the graph (3^, ~), we can have [^J — 1 different 
cases (note that the case of c = 1 is not possible because the datasets are fully 
connected via their adjacency relation), whose extremes are: 



• (y, ~) is a clique, i.e. every element has exactly |3^| — 1 adjacent elements. 

• (3^, ~) is a ring, i.e. every element has exactly two adjacent elements. 
This is similar to the case of the counting queries considered in [GRS09], 
with the difference that our "counting" is in arithmetic modulo |3^|. 
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Remark 55. Note that our method can he applied also when the conditions 
of Theorem 54 are not met: We can always add "artificial" adjacencies to the 
graph structure so as to meet those conditions. Namely, for computing the 
distance in (5.23) we use, instead o/(3^, ~), a structure (3^, ~') which satisfies 
the conditions of Theorem 54, and such that ~C~'. Naturally, the matrix 
constructed in this way provides e- differential privacy, hut in general is not 
optimal. It is clear that, in general, the smaller ~' is, the higher is the utility. 

The matrices generated by (5.23) can be very different, depending on tlie 
value of c. The next two examples illustrate queries that give rise to the clique 
and to the ring structures, and show the corresponding matrices. 

Example 8. Consider a datahase with electoral information where each entry 
corresponds to a voter and contains the following three fields: 

• Id; a unique (anonymized) identifier assigned to each voter; 

• City; the name of the city where the user voted; 

• Candidate; the name of the candidate the user voted for. 

Consider the query "What is the city with the greatest number of votes for 
a given candidate candT\ For such a query the hinary utility function could 
he taken as the natural choice: from the user's point of view, only the right 
city could give some gain, and all wrong answers would he equally had. It is 
easy to see that every two answers are neighhors, i.e. the graph structure of 
the answers is a clique. 

Let us consider the scenario where City = {A,B,C,D,E,F} and assume 
for simplicity that there is a unique answer for the query, i. e. there are no two 
cities with exactly the same numher of individuals voting for candidate cand. 
Tahle 5.1 shows two alternative mechanisms providing e- differential privacy 
(with e = \n.2). The first one. Mi, is hased on the truncated geometric mecha- 
nism method used in [CRS09] for counting queries (here extended to the case 
where every two distinct answers are neighhors). The second mechanism, M2, 
is ohtained hy applying the definition of (5.23). From Theorem 54 we know 
that for the uniform input distrihution M2 gives optimal utility. 

For the uniform input distrihution, it is easy to see thatlA{Mi) = 0.2242 < 
0.2857 = U{M2). Even for non-uniform distrihutions, our mechanism still 
provides hetter utility. For instance, for p{A) = p{F) = 1/10 and p{B) = 
p{C) = p{D) = P{E) = 1/5, we have U{Mi) = 0.2412 < 0.2857 = U{M2). 
This is not too surprising: the geometric mechanism, as well as the Laplacian 
mechanism proposed hy Dwork, perform very well when the domain of answers 
is provided with a metric and the utility function is not hinary^. It also works 

® As we mentioned before, in the metric case the gain function can take into account the 
proximity of the reported answer to the real one, the idea being that a close answer, even if 
wrong, is better than a distant one. 
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In/Out 


A 


B 


C 


D 


E 


F 


A 


0.535 


0.060 


0.052 


0.046 


0.040 


0.267 


B 


0.465 


0.069 


0.060 


0.053 


0.046 


0.307 


C 


0.405 


0.060 


0.069 


0.060 


0.053 


0.353 


D 


0.353 


0.053 


0.060 


0.069 


0.060 


0.405 


E 


0.307 


0.046 


0.053 


0.060 


0.069 


0.465 


F 


0.267 


0.040 


0.046 


0.052 


0.060 


0.535 



(a) Mi: truncated geometric mechanism 



In/ Out 


A 


B 


C 


D 


E 


F 


A 


2/7 


1/7 


1/7 


1/7 


1/7 


1/7 


B 


1/7 


2/7 


1/7 


1/7 


1/7 


1/7 


C 


1/7 


1/7 


2/7 


1/7 


1/7 


1/7 


D 


1/7 


1/7 


1/7 


2/7 


1/7 


1/7 


E 


1/7 


1/7 


1/7 


1/7 


2/7 


1/7 


F 


1/7 


1/7 


1/7 


1/7 


1/7 


2/7 



(b) A/2: our mechanism 



Table 5.1: Mechanisms for the city with higher number of votes for candidate 
cand 



well when (3^, ~) has low connectivity, in particular in the cases of a ring 
and of a line. But in this example, we are not in these cases, because we are 
considering binary gain functions and high connectivity. 

Example 9. Let us consider the same database as the previous example, but 
now assume a counting query of the form "What is the number of votes for 
candidate cand?" . It is easy to see that each answer has at most two neighbors. 
More precisely, the graph structure on the answers is a line. For illustration 
purposes, let us assume that only 5 individuals have participated in the election. 
Table 5.2 shows two alternative mechanisms providing e- differential privacy 
(e = log2j; the truncated geometric mechanism Mi proposed in [GRS09] and 
the mechanism we propose M2. Note that in order to apply our method we 
have first to apply Remark 55 to transform the graph structure from a line into 
a ring. 

Let us consider the uniform prior distribution. We see that the utility of 
Ml is higher than the utility of M2, in fact the first is 4/9 and the second is 
8/21. This does not contradict our theorem, because our matrix is guaranteed 
to be optimal only in the case of a ring structure, not a line as we have in this 
example. If the structure were a ring, i.e. if the last row were adjacent to the 
first one, then Mi would not provide e- differential privacy. In case of a line as 
in this example, the truncated geometric mechanism has been proved optimal 
[GRS09J. 
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In/Out 





1 


2 


3 


4 


5 





2/3 


1/6 


1/12 


1/24 


1/48 


1/48 


1 


1/3 


1/3 


1/6 


1/12 


1/24 


1/24 


2 


1/6 


1/6 


1/3 


1/6 


1/12 


1/12 


3 


1/12 


1/12 


1/6 


1/3 


1/6 


1/6 


4 


1/24 


1/24 


1/12 


1/6 


1/3 


1/3 


5 


1/48 


1/48 


1/24 


1/12 


1/6 


2/3 



(a) Mi: truncated i-geom. mechanism 



In/Out 





1 


2 


3 


4 


5 





8/21 


4/21 


2/21 


1/21 


2/21 


4/21 


1 


4/21 


8/21 


4/21 


2/21 


1/21 


2/21 


2 


2/21 


4/21 


8/21 


4/21 


2/21 


1/21 


3 


1/21 


2/21 


4/21 


8/21 


4/21 


2/21 


8 


2/21 


1/21 


2/21 


4/21 


8/21 


4/21 


5 


4/21 


2/21 


1/21 


2/21 


4/21 


8/21 



(b) M2: om mechanism 



Table 5.2: Mechanisms for the counting query (5 voters) 



5.7 Related work 

To the best of our knowledge, the first work to investigate the relation between 
differential privacy and information-theoretic leakage for an individual was 
[ACDPIO]. In this work, the definition of channel was relative to a given 
database x, and the channel inputs were all possible databases adjacent to 
X. Two bounds on leakage were presented, one for the min-entropy, and one 
for Shannon entropy. Our bound in Proposition 49 is an improvement with 
respect to the (min-entropy) bound in [ACDPIO]. 

Barthe and Kopf [BKll] were the first to investigate the (more challeng- 
ing) connection between differential privacy and the min-entropy leakage for 
the entire universe of possible databases. They considered the "end-to-end 
differentially private mechanisms", which correspond to what we call the ran- 
domized function /C in this chapter, and proposed, like we do, to interpret them 
as information-theoretic channels. They provided a bound for the leakage, but 
pointed out that it was not tight in general. They also showed that there 
cannot be a domain-independent bound, by proving that for any number of 
individuals u the optimal bound must be at least a certain expression /(u, e). 
Finally, they showed that the question of providing optimal upper bounds for 
the leakage of e-differentially private randomized functions in terms of rational 
functions of e is decidable, and left the actual function as an open question. 
In our work we used rather different techniques and found (independently) the 
same function f{u, e) (the bound in Theorem 43), but we actually proved that 
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f{u, e) is the optimal bound^. Another difference between their work and ours 
is that [BKll] captures the case in which the focus of differential privacy is 
on hiding participation of individuals in a database, whereas we consider both 
the participation and the values of the participants. 

Clarkson and Schneider also considered differential privacy as a case study 
of their proposal for quantification of integrity [CSll]. There, the authors 
analyzed database privacy conditions from the literature (such as differential 
privacy, fc-anonymity, and /-diversity) using their framework for utility quan- 
tification. In particular, they studied the relationship between differential 
privacy and a notion of leakage (which is different from ours - in particular 
their definition is based on Shannon entropy) and they provided a tight bound 
on leakage. 

Heusser and Malacaria [HM09] were among the first to explore the appli- 
cation of information-theoretic concepts to databases queries. They proposed 
to model database queries as programs, which allows for statistical analysis 
of the information leaked by the query. [HM09], however, did not attempt to 
relate information leakage to differential privacy. 

In [GRS09] the authors aimed at obtaining optimal-utility randomization 
mechanisms while preserving differential privacy. The authors proposed adding 
noise to the output of the query according to the geometric mechanism. Their 
framework is very interesting in the sense it provides a general definition of 
utility for a mechanism M that captures any possible side information and 
preference (defined as a loss function) the users of M may have. They proved 
that the geometric mechanism is optimal in the particular case of counting 
queries. Our results in Section 5.6 do not restrict to counting queries, but on 
the other hand we only consider the case of binary loss function. 

5.8 Chapter summary and discussion 

In this chapter we have investigated the relation between e-differential privacy 
and leakage, and between e-differential privacy and utility. Our main con- 
tribution was the development of a general technique for determining these 
relations depending on the graph structure of the input domain, induced by 
the adjacency relation and by the query. We have considered two particular 
structures, the distance-regular graphs, and the VT~^ graphs, which allowed us 
to obtain tight bounds on the leakage and on the utility. We also constructed 
an optimal randomization mechanism satisfying e-differential privacy for some 
special cases. 

As future work, we plan to extend our result to other kinds of utility 
functions. In particular, we are interested in the case in which the the answer 

^When discussing our result with Barthe and Kopf, they said that they also conjectured 
that /(it, e) is the optimal bound. 
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domain is provided with a metric, and we are interested in taking into account 
the degree of accuracy of the inferred answer. 
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Six 



Safe equivalences for security 

properties 



"Too much may he the equivalent of none at all. " 

Lee Loevinger 

In the field of Security, process equivalences have been used to character- 
ize various information-hiding properties (for instance secrecy, anonymity and 
noninterference) based on the principle that a protocol P with a variable x 
satisfies such a property if and only if, for every pair of secrets s\ and S2, 
/ x] is equivalent to PY'^ / x\- We argue that, in the presence of nondeter- 
minism, the above principle may rely on the assumption that the scheduler 
"works for the benefit of the protocol', and this usually is not a safe assump- 
tion. Non-safe equivalences, in this sense, include complete-trace equivalence 
and bisimulation. 

The goal of this chapter is to present a formalism in which we can specify 
admissible schedulers and, correspondingly, safe versions of these equivalences. 
Then we are able to show that safe equivalences can be used to establish 
information-hiding properties. 

Contribution The main contributions of this chapter can be summarized 
as follows. 

• We propose a formalism for concurrent distributed systems which ac- 
counts for both probabilistic and nondeterministic behavior, and in which 
the latter is of two kinds: global and local. The global nondeterminism 
represents the possible interleavings produced by the parallel compo- 
nents, which may be influenced by the attacker. The local nondeter- 
minism is associated to the possible internal choices of each component, 



133 



6. Safe equivalences for security properties 



which may depend on the secrets or other unknown parameters, not con- 
trolled by the attacker. Correspondingly, we split the scheduler into two 
constituents: a global one and a local one. The latter is actually a tuple 
of local schedulers, one for each component of the system. 

• We propose a notion of admissible scheduler for the above systems, 
in which the global constituent is not allowed to see the secrets, and 
each local constituent is not allowed to see any information about the 
other components. We then generalize the standard definition of strong 
(probabilistic) information hiding (such as noninterference and strong 
anonymity) to the case in which also nondeterminism is present, under 
the assumption that the schedulers are admissible. 

• We use admissible schedulers to define safe versions of complete-trace^ 
equivalence and bisimilarity which are specially tuned for security. This 
means that we account for the possibility that the global constituent of 
the scheduler is in collusion with the attacker, and therefore does not 
necessarily help the system to obfuscate the secret. We show that the 
bisimilarity is still a congruence, as in the classical case. 

• We finally show that our notions of safe complete-trace equivalence and 
bisimilarity imply strong information hiding in the sense discussed above. 

Plan of the Chapter This chapter is organized as follows. In Section 6.1 
we review the role equivalences traditionally play in formalizing security prop- 
erties. In Section 6.2 we formalize the notions of distributed systems and 
components used in this chapter. In Section 6.3 we focus on restricting the 
discerning power of global and local schedulers, and in Section 6.4 we present 
our proposal for safe equivalences, namely safe complete-traces and safe bisim- 
ilarity. In Section 6.5 we define the notion of information hiding under the 
novel assumption that nondeterminism is handled partly in a demonic way 
and partly in an angelic way. Finally, in Section 6.6 we review the related 
bibliography, and in Section 6.7 we summarize the chapter and outline some 
future work. 

6.1 The use of equivalences in security 

As we have seen in Chapter 1, one technique used to prevent an attacker of 
inferring the secret from the observables is to create noise, namely to make sure 
that for every execution in which a given secret produces a certain observable, 
there is at least another execution in which a different secret produces the 
same observable. In practice this is often done by using randomization. 

^In this chapter we may refer to "complete traces" simply as "traces". 
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In the hterature about the foundations of computer security, however, the 
quantitative aspects are often abstracted away, and probabiUstic behavior is 
replaced by nondeterministic behavior. Correspondingly, there have been var- 
ious approaches in which information-hiding properties are expressed in terms 
of equivalences based on nondeterminism, especially in a concurrent setting. 
For instance, [SS96] defines anonymity as follows^: A protocol S is anonymous 
if, for every pair of culprits a and 6, /x] and S'^ / x] produce the same ob- 
servable traces. A similar definition is given in [AG99] for secrecy^ with the 
difference that /x] and /x] are required to be bisimilar. In [DKR09], an 
electoral system S preserves the confidentiality of the vote if for any voters v 
and the observable behavior of S is the same if we swap the votes of v and 
w, i.e. if I'' /u,\ is bisimilar to S[V^ I" /w]- 

These proposals are based on the implicit assumption that all the nonde- 
terministic executions present in the specification of S will always he possible 
under every implementation of S. Or at least, that the adversary will believe 
so. In concurrency, however, as argued in [CNP09], nondeterminism has a 
rather different meaning: if a specification S contains some nondeterministic 
alternatives, typically it is because we want to abstract from specific imple- 
mentations, such as the scheduling policy. A specification is considered cor- 
rect, with respect to some property, if every alternative satisfies the property. 
Correspondingly, an implementation is considered correct if all executions are 
among those possible in the specification, i.e. if the implementation is a re- 
finement of the specification. There is no expectation that the implementation 
will actually make possible all the alternatives indicated by the specification. 

We argue that the use of nondeterminism in concurrency corresponds to a 
demonic view: the scheduler, i.e. the entity that will decide which alternative 
to select, may try to choose the "worst" alternative. Hence we need to make 
sure that all alternatives are "good", in the sense that they satisfy the intended 
property. In the approaches to formalize security properties mentioned above, 
on the contrary, the interpretation of nondeterminism is angelic: the scheduler 
is expected to actually help the protocol to confuse the adversary and thus 
protect the secret information. 

There is another issue, orthogonal to the angelic/demonic dichotomy, but 
relevant for the achievement of security properties: the scheduler should not be 
able to make its choices dependent on the secret, or else nearly every protocol 
would be insecure, i.e. the scheduler would always be able to leak the secret 
to an external observer (for instance by producing different interleavings of 
the observables, depending on the secret). This remark has been made several 
times already, and several approaches have been proposed to cope with the 
problem of full-information schedulers (aka almighty, omniscient, clairvoyant, 
etc.), see for example [CCK+06a, CCK+06b, CP, CNP09, APvRS]. 

The risk of a naive use of nondeterminism to specify a security property is 

^The actual definition of [SS96] is more complicated, but the spirit is the same. 
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not only that it may rely on an implicit assumption that the scheduler behaves 
angelically, but also that it is clairvoyant (fully-informed), i.e. that it peeks 
at the secrets (that it is not supposed to be able to see) to achieve its angelic 
strategy. 

Example 10. Consider the following system, presented in a CCS-like syntax: 

S =^ {c){A \\ Hi II H2 II Corr), with A =^ c{sec), Hi =^ c{s).out{a), 

H2 '= c{s).out{b) , Corr c{s).out{s) . The name sec represents a secret. 

It is easy to see that we have S ["'/sec] ~ S [^/sec\ , is shown in the execution 
tress in Figure 6.1. Note that, in order to simulate the rightmost branch in 
S ["/sec] J the process S ^ / sec\ needs to follow its leftmost branch. Vice-versa, in 
order to simulate the rightmost branch in S ^ / sec\ , the process S ["/sec] needs 
to follow its middle branch. This means that, in order to achieve bisimulation, 
the scheduler needs to know the secret, and change its choice accordingly. 



c{a) II c{s).out{a) \\ c{s).out{b) \\ c{s).out{s) 



\\^t{a) II - II - 
out (a) 



out{b) II 
out(b) 



- II - II out{a) 
out (a) 



(a) S[7. 



c(6) II c{s).out{a) II c(s).out{b) \\ c{s).out{s) 



- II out{a) II - II - - II - II out{b) II - - II - II - II aut{b) 
out (a) out(b) out(b) 

- II - II - II - - II - II - II - - II - 

(b) Sl'/sec] 

Figure 6.1: Execution trees for Example 10 

This example shows a distributed system that intuitively is not secure, 
because one of its components, Corr, reveals whatever secret it receives. Ac- 
cording to the equivalence-based notions of security discussed above, however, 
it is secure. But it is considered secure thanks to a scheduler that: 



136 



6.2. Distributed systems and components 



(i) angelically helps the system to protect the secret; and 

(ii) does so by making its choices dependent on the secret. 

We consider these assumptions on the scheduler to be excessively strong. 

Here we do not claim, however, that we should rule out the use of angelic 
nondeterminism in security: on the contrary, angelic nondeterminism can be a 
powerful specification concept. We only advocate a cautious use of this notion. 
In particular, it should not be used in a context in which the scheduler may be 
in collusion with the attacker. The goal of this chapter is to define a framework 
in which we can combine both angelic and demonic nondeterminism in a setting 
in which also probabilistic behavior may be present, and in a context in which 
the scheduler is restricted (i.e. not fully- informed) . We define "safe" variant of 
typical equivalence relations (complete traces and bisimulation) , and we show 
how to use them to characterize information-hiding properties. 

6.2 Distributed systems and components 

In this section we describe the kind of distributed systems we are dealing 
with. We start by introducing a variant of probabilistic automata, that we 
call Tagged Probabilistic Automata (TPA). These systems are parallel com- 
positions of probabilistic processes, called components. Each component is 
equipped with a unique identifier, called tag. Whenever a component (or a 
pair of components in case of synchronization) makes a step, the correspond- 
ing transition will be decorated with the associated tag (or pair of tags). 

Similar systems have been already introduced in [APvRS]. The main dif- 
ferences are that here the components may contain nondeterminism 

6.2.1 Tagged Probabilistic Automata 

We now formalize the notion of TPA. 

Definition 56. A Tagged Probabilistic Automaton (or TPAJ is a tuple 
{Q,T, CjQ,!!)), where Q is a set of states, T is a set of tags, C is a set 
of actions, q £ Q is the initial state, and i9: Q — > V{T x C x T>{Q)) is a 
transition function. 

In the following we write q fi for {tg,a,^) G "diq), and we use enab{q) 
to denote the tags of the components that are enabled to make a transition. 
More formally: 

enab{q) =^ {tg £ T \ there exists a £ C,fx £ T^iQ) such that q /x} 

In these systems, we can decompose the scheduler into two: a global scheduler, 
which, via tags, decides which component or pair of components makes the 
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next move, and a local scheduler, which, also via tags, solves the internal 
nondeterminism of the selected component. 

We assume that the local scheduler can only select enabled transitions, and 
that the global scheduler can only select enabled components. This means 
that the execution does not stop unless all components are blocked. This is in 
line with the tradition of process algebra and of Markov Decision Processes, 
but contrasts with that of Probabilistic Automata [SL95]. The results in this 
chapter, however, do not depend on this assumption. 

Definition 57. Let M = {Q,T,C,q,i}) be a TPA. Then: 

• A global scheduler for M is a function Q: Paths*(M) — )• (TU {-L}) such 
that for all finite paths a, if enab{last{a)) 7^ then C,{a) € enab{last{a)), 
and C(c) = -L otherwise. 

• A local scheduler for M is a function ^ : Paths* (M) — ^ (T x £ x 2?(Q) U 
{_L}) such that, for all finite paths a, if 'd{last{a)) 7^ then ^(cr) S 
'Q{last{a)), and ^(cj) = _L otherwise. 

• A global scheduler C, and a local scheduler ^ for M are compatible if, 
for all finite paths a, = {tg,a,fi) implies C(o') = tg, '^f^d '^(c) = -L 
implies Ci'^) = -L- 

• A scheduler is a pair (CjC) compatible global and local schedulers. 
6.2.2 Components 

We will use a simple probabilistic process calculus, very close to the CCSp we 
introduced in Chapter 2, to specify the components. 

We assume a set of actions or channel names C with elements a, ai, 02, • • • , 
including the special symbol r denoting a silent step. Except for r, each action 
a has a co-action a ^ C and we assume a = a. Components are specified by 
the following grammar: 

g ::= I a.q \ qi+q2 \ '^Pi ■ qi \ qi\q2 \ {a)q \ Q 

i 

The constructs 0, a.q, qi + q2, qi\q2 and (a)q represent termination, prefix- 
ing, nondeterministic choice, parallel composition, and the restriction operator, 
respectively. ^^Pi : is a probabilistic choice, where pi represents the prob- 
ability of the «-th branch and must satisfy < < 1 and ^^Pi = 1. The 
process call Q is a simple process identifier. For each identifier, we assume 
a corresponding unique process declaration of the form Q = q. The idea is 
that, whenever Q is executed, it triggers the execution of q. Note that q can 
contain Q or another process identifier, which means that our language allows 
(mutual) recursion. We will denote by fn{q) the free channel names occurring 
in q, i.e. the channel names not bound by a restriction operator. 
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Components' semantics: The operational semantics consists of probabilis- 
tic transitions of the form q^(JL where g € Q is a process, a G £ is an action and 
jJL € 'D{Q) is a distribution on processes. They are specified by the following 
rules: 

a 

PRF NDT 



a.q A- 5q qi+q2 fJ- 

a 

qi^ fi 

PRB PAR 



a a c o. r- 

q^ fJ, ^ ^ qi-^ dri q2 ^ o,.. 
CALL iiA=% COM 



a 

q^ H 

RST a,a^b 



r2 



We assume also the symmetric versions of the rules NDT, PAR and COM. 
Recall that the symbol 6q is the delta of Dirac, which assigns probability 1 to g 
and to all other processes. The symbol Ei is the summation on distributions. 
Namely, Ei^** ' f^i distribution fi such that fi{x) = ^^Pi ■ fJ-i{x). The 

notation fj, \ q represents the distribution fi' such that /i'(r) = fi{q') ii r = q' \ q, 
and /i'(r) = otherwise. Similarly, {b)fi represents the distribution fi' such 
that n'{q) = fJ'iq') if g = {b)q' , and n'{q) = otherwise. 

Remark 58. In some of the examples in this chapter we use an extension of 
our process calculus that allows message passing (cfr. Chapter 2). Since the 
expressive power of our calculus with message passing or without it is the same, 
we consider explicit message passing simply as an alias for the correspondent 
encoding into the presentation of the calculus given above. 

6.2.3 Distributed systems 

A distributed system has the form [A) qi \\ q2 \\ • • • \\ qn, where the Qj's are 
components and A C C. The restriction on A enforces synchronization on the 
channel names belonging to A, in accordance with the CCS spirit. 

Systems' semantics The semantics of a system gives rise to a TPA, where 
the states are terms representing systems during their evolution. A transition 
now is of the form q fi where a € £, /x € 'Z^(Q), and € T is either the 
tag of the component which makes the move, or a (unordered) pair of tags 
representing the two partners of a synchronization. We can simply define T 
as T = I U where / = {1, 2, . . . , n} is the set of components' identifiers. 
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Interleaving ■ a^A 

(^) qi\\---\\qi\\---\\qn^ KkPk ■ ^iA)qj\\-\\q,k\\-\\gn 



where i is the tag indicating that the component i is making the step. Note 
that we assume that probabihstic choices are finite. This imphes that every 
transition q fi can be written q pk ■ (5^^,, and justifies the notation 

used in the interleaving rule. 



Qt dq> qj 6g> 
Synch. 

/ A^ ^^'^h'^ X 

[A) qi \\ ■ ■ ■ \\ qi \\ ■ ■ ■ \\ qj || • • • || — > 



here is the tag indicating that the components making the step are i and 

j. Note that it is an unordered pair. Sometimes we will write i,j instead of 
for simplicity. 

Example 11. Consider again the systems of Example 10. Figures 6.2(a) and 
6.2(b) show the TPAs for S \^ / sec\ O'^'d for S \^ / sec\ respectively. For simplicity 
we do not write the restriction on channels c and out, nor the termination 
symbol 0. We use ' to denote a component that is stuck. The corresponding 
tags are indicated in the figure with numbers above the components. 

The set of enabled transitions should be clear from the figures. For instance, 
we have enab{S \^ / sec\) = {{li 2}, {1, 3}, {1, 4}} and enab{ — \\ out{a) \\ — 
II — ) = {2}. The scheduler C, defined as 



dej 



{1,4} ifa = S[-/ 
2 
3 
4 

_L otherwise, 



ec\ t 

if a = S[-Uec]'^{ 
if a = S[-Uec]'^{ 
if a = S[-Uec]^^{ 



out (a) 



out{b) 



out (a) 



is a global scheduler for S ["/sec] • 



6.3 Admissible schedulers 

In this section we restrict the discerning power of the global and local sched- 
ulers in order to avoid the problem of the information leakage induced by 
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— II out{a) 
2: out (a) 



12 3 4 

c{a) II c{s).out{a) \\ c{s).out{b) \\ c{s).out{s) 




(a) Sr/s 



12 3 4 

c(6) II c{s).out{a) II c{s).out{b) \\ c{s).out{s) 




— II out{a) || — ||— ^ll^ll oiit{b) II — 
2: out (a) 3:ouf(b> 

- II - II - II - - II - 11 - II - - I 

(b) Sl'/se.] 

Figure 6.2: TPAs for Example 11 



II - II out{b) 
4:ouf(b> 



clairvoyant schedulers. We impose two kinds of restrictions: For the global 
scheduler, following [APvRS], we assume that it can only see, and keep mem- 
ory of, the observable actions and the components that are enabled, but not 
the secret actions. As for the local scheduler, we assume that the local nonde- 
terminism of each component is solved on the basis of the view of the history 
local to that component, i.e. the projection of the history of the system on 
that component. In other words, each component has to make decisions based 
only on the history of its own execution; it cannot see anything of the other 
components. 



6.3.1 Restricting global schedulers 

We assume that the set of actions £. is divided in two disjoint sets, the secret 
actions S and the observable actions O, such that S L) O = C. The secret 
actions are supposed to be invisible to the global scheduler. Formally, this can 
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be achieved using a function sift with 

I r if a G 5, 



sift (a) 



a otherwise. 



Then, we restrict the power of the global scheduler by forcing it to make the 
same decisions on paths he cannot tell apart. 

Definition 59. Given a TPA M , a global scheduler C, for M is admissible if 
for all paths ai and a2 we have view[ai) = view{a2) implies C(o'i) = C{'^2), 
where 

f^tgi-.ai tg2:a2 tgn-an \ def , , . , , , 

View I q — y qi — > ■■■ — > ) = {enab{q), sift{ai),tgi) 

{enab{qi), sift{a2),tg2) ■ ■ ■ (ena6(g„), sift{an),tgn) 

The idea is that view sifts the information of the path that the scheduler 
can see. Since sift "hides" the secrets, the scheduler cannot take different 
decisions based on them. 

6.3.2 Restricting local schedulers 

The restriction on local schedulers is based on the idea that a step of the 
component i of a system can only be based on the view that i has of the history, 
i.e. its own history. In order to formalize this restriction, it is convenient to 
introduce the concept of i-view of a path a, or projection of a on i, which we 
will denote by a\i. We define it inductively: 



[a — > 



( vh 

^q. if tg = {i,j} and ^ = 6i^a) 

i:a .„ 

> H it tg = I 

atj otherwise 



In the above definition, the first line represents the case of a synchronization 
step involving the component i, where we assume that the premise for i is of 
the form q[ — > 6q.. The second line represents an interleaving step in which i 
is the active component. The third line represents step in which the component 
i is idle. 

The restriction to the local scheduler can now be expressed as follows: 

Definition 60. Given a TPA M and a local scheduler ^ for M, we say that 
^ is admissible if for all paths a and a' , if whenever ^(a) = {tg,a,fj,), and 
Cic') = {t'g,a',fi') we have: 

• if tg = t'g = i and a\i = a'^-, then ^{a) = C{cr'), 

• iftg = t'g = = and a^j = a\- then ^{a) = i{a'). 

A pair of compatible schedulers [Q, ^) is called admissible if Q and ^ are 
admissible. 
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6.4 Safe equivalences 

In this section we revise process equivalence notions to make them safe for 
security. 

6.4.1 Safe complete traces 

We define here a safe version of complete-trace semantics. The idea is that 
we compare two processes based not only on their traces, but also on the 
choices that the global scheduler makes at every step. We do this by recording 
explicitly the tags in the traces. 

Definition 61. Here we define the notion of safe complete traces. 

• Given a TPA M = (Q, T, the (complete) safe traces of M, de- 
noted here by Traces s, are defined as the probabilities of sequences of tags 
and actions corresponding to all possible complete executions, i.e. 

TraceSs{M) ={ f : {T x C)°° [0, 1] | 

there exists an admissible scheduler{(^,S^) s.t. 

f{t) = PA/,c,?({f^ e CPaths(M) I traceta{(y) = t]) } 

where Pj\.f^^^g is the probability measure in M under {C,C)j '^''^d traccta 
extracts from a path the sequence of tags and actions, i.e. 

tracetaie) = e 
tracctaiq ^ a) = tg : a ■ traceta{o-) 

• We denote by Traces s{q) the safe traces of the automaton associated to 
a system q. 

• Two systems qi and q2 are safe-trace equivalent, denoted by qi ~3 q2, if 
and only if TraceSs{qi) = TraceSs{q2)- 

The following example points out the difference between ~s and the stan- 
dard (complete) trace equivalence. 

Example 12. Consider the TPAs of Example 11. The two TPAs have the 
same complete traces. In fact we have 

Traces {S [""/ sec]) = {r ■ out{a) , t ■ out{b)} = Traces{S 
But on the other hand, we have 

TraceSsiS [V^ec]) = {/i, /2, /s} {/i, /2, M = TraceSs{S TAec]) 
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where 



/2(t) 



hit) 




















1 



1 



1 



1 



if t = {1,2} : T ■ 2 : out{a), 

for all other values of t ^ (T x C) 

i/t = {1,3} : r • 3 : mE{b), 

for all other values of t ^ (T x £) 

i/t = {1,4} : r • 4 : oui{a), 

for all other values of t G (T x C) 

i/t = {1,4} : r • 4 : ^(6), 

for all other values of t ^ (T x C) 



oo 



oo 



oo 



oo 



6.4.2 Safe bisimilarity 

In this section we propose a security-safe version of strong bisimulation, that 
we call safe bisimulation. This is an equivalence relation stricter than safe- 
trace equivalence, with the advantage of being a congruence. Since in this 
chapter we assume that schedulers can always observe which component is 
making a step (even a silent step), it does not seem natural to consider weak 
bisimulation. 

We start with some notation. Given a TPA M = {Q,T,C,q,i!)), and a 
global scheduler we write q — — if there exists a € Paths* (M) such that 
C{cr) 7^ _L, {(^{a),a, fi) G T!){q), and q = last{a). Note that the restriction to Q 
still allows nondeterminism, i.e. there may be fii, ij,2, such that q — fJ-i and 
q fJ.2 (with either ai = 02 or ai 7^ 02). 

We now define the notion of safe bisimulation. The idea is that, if qi and 
q2 are bisimilar states, then every move from qi should be mimicked by a move 
from q2 using the same (admissible) scheduler. 

Definition 62. Given a TPA M = {Q,T, C,q,^), we say that a relation 
TZ Q Q X Q is a safe bisimulation if and only if, whenever qi TZq2: 

1. enab{qi) = enah{q2), and 

2. for all admissible global schedulers C, for M such that C(o"i) lZC,{a2) when- 
ever last{ai) = qi and last{a2) = g2- 

• ^/^i ~~^C /^i' ^^6"- there exists ^2 such that q2 —^Q ^2 (^nd fii TZ I-I2, 
and 

• ^/92 ~~^C /^2, then there exists fii such that qi — — >f ^1 and fii TZn2, 

where ^i1Z(JL2 means that for all equivalence classes X € Q^, we have 
Hi{X) = fj,2{X), where It is the smallest equivalence class induced by 71. 
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It is possible to simplify Definition 62, restricting the schedulers to be 
history- independent. In other words, to show that two distributed systems 
are bisimilar, it suffices to consider one-step computations and show that two 
states are equivalent by using only history-independent schedulers. The lemma 
bellow justifies this claim. 

Lemma 63. Let M = {Q,'T,C,q,'d) be a TPA, and let TZ be an equivalence 
relation on the set of states Q. Consider C, to be a global scheduler for M such 
that, for every pair of states qi,q2 € Q, if qi = last{ai)TZlast{a2) = (?2 then 
= C(<^2)- In that case C, is history-independent, i.e. it depends only on 
the last state of a path a. 

Proof. It is easy to see that the relation of having the same last state is an 
equivalence relation on paths, and therefore it determines a partition on the set 
of paths. Since the above qi and q2 may be identical, the scheduler must give 
the same value on equivalent paths and it is, therefore, history- independent. 



Using the lemma above, in the following results about safe bisimulation 
we will usually write C,{q) where q is a state. Note however that this does 
not mean that in the computations of safely bisimilar systems the schedulers 
are necessarily history-independent: at each step of the computation we may 
change scheduler, and therefore we may change alternative when we pass by 
the same state g at a later time. 

The following result is analogous to the case of standard bisimulation. It 
implies that largest safe bisimulation exists, and coincides with the union of 
all safe bisimulations. We call it safe bisimilarity, and we denote it by 

Proposition 64. The union of all the safe bisimulations is still a safe bisim- 
ulation. 

Proof. Assume that qi ~s (72- Then qiTZq2 holds, for some safe bisimulation 
TZ. Hence we have enab{qi) = enab{q2), and for every global scheduler ^, if 
CiQi) = C('?2)) and qi — /xi, then there exists 112 such that (72 — 1^-2, 
and fiiTZfi2- This implies that /ii ~s fj,2- In fact 1Z (the smallest equivalence 
class induced by TZ) is a finer relation than i.e. qiTZq2 implies qi'^sQ2- 
Also, TZ is an equivalence relation, and therefore it induces a partition on 
each of the equivalence classes X € Hence we have, for each X G Q-c^, 



□ 

Given two TPAs Mi = {Qi,T, C,qi,^i) and M2 = {Q2,T, £,42,^^2) shar- 
ing the same set of tags T and actions £, we can define bisimulation and 



□ 




^ fi2, then there exists 
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bisimilarity across their states, i.e. as relations on (Qi U Q2), in the obvious 
way, by constructing the TPA M with a new initial state q with transitions to 
6q-^ and to Sq^ , respectively. 

Given two components or systems qi and q2, we will say that qi and q2 are 
safely bisimilar, denoted by qi q2, if the initial states of the corresponding 
TPAs are safely bisimilar. Note that qi ~s q2 is possible only if gi and q2 have 
the same number of active components, where "active", for a component, means 
that during the execution of the system it will make at least one step. Note that 
in the case of components, or of systems constituted by one component only, 
safe bisimulation and safe bisimilarity coincide with standard bisimulation and 
bisimilarity (denoted by ~), respectively. This is not the case for systems, as 
shown by the following example: 

Example 13. Consider again the TPAs of Example 11. As pointed out earlier 
in this chapter, we have S[°-/sec\ ~ S p/sec]- Yet S[°-/sec\ S ^/sec] - To 
show this, let us construct a new^TPA (as described before) with initial state 
q such that q S [""/sec] CLnd q 5 [^/sec] • Now consider the (admissible) 
global scheduler C, such that 
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It is easy to see that S [''/sec] cannot mimic the transition 4:ont(a) produced 
by S ["/sec] using the same scheduler d^. 

We now show that safe bisimulation is a congruence with respect to all the 
operators of our language. In the following theorem, statements 2a and 2b are 
just the standard compositionality result for probabilistic bisimulation. 

Theorem 65. 

1. is an equivalence relation. 



2. Let a ^ C be an action and A,B,B' Q C be sets of restrictions. Let 
pi, . . . ,Pn be probability values, and let q,qi,q2, . . . , qn, ^'1, ^2, ■ ■ ■ ^Qn 
components. 
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qi+ q q2 + q, {a)qi ~5 

T.iPi ■ qi T.iVi ■■ q'i- 

.. II q'^, and fn{q) ^ B U B' , 
(A U 5) gi II . . . II g II . . . II g„ ~, {AUB') q[\\ ...\\q\\ ... \\ q'^. 

Proof. 

1. Although safe bisimulations are not equivalence relations in general, their 
union, i.e. safe bisimilarity, is an equivalence. In fact: 

• It is easy to see that, if 7^ is a safe bisimulation, then the smallest 
equivalence that includes TZ, namely TZ, is also a safe bisimulation. 

• From Proposition 64 we know that ~s is a safe bisimulation. 

• Hence we derive that ~s is a safe bisimulation, and therefore ~3 C 
^s- But since obviously we conclude that ~<j= ~s) which 
means that ~s is already an equivalence relation. 



a) If qi ~s q2, then a.qi ~s a.q2, 
{a)q2, and gi | g ~s 92 | q- 

b) If gi ~^ g'l, . . . , qn ~s q'n , then 

c) If (B) ft II ... II qn {B') q[ \\ 
then 



2. Assume that a, A,B,B',pi, . . . ,Pn,q,qi,q2, ■■■■,qn, q[,q2, ■■■,qn are of 
the type prescribed by the hypothesis of the theorem. 

a) Assume qi ~jj q2- 

• Let 

TZ = {{a.qi,a.q2)}U ~s • 

We show that 7^ is a safe bisimulation, which is sufficient to 
prove that a.qi ~s a.q2- Note that, since there is only one 
component in each of those states, and it is enabled, we have 
enab{a.qi) = enah{a.q2) = {!}, and C,{a.qi) = ({a.q2) = 1 for 
any global scheduler (. Given a global scheduler ^, there is 
exactly one transition from each of a.qi and a.q2. these are 
a.qi 6q^ and a.q2 A^; Sq^, respectively, which mimic each 
other in the action a. Finally, since qi ~s q2, we have Sg-^ ~s 6q^ 
and therefore 6q^ TZ Sq^ ■ 

• Let 

T^ = {{qi + q,q2 + q)}^ ~s • 

We show that 7^ is a safe bisimulation, which is sufficient to 
prove that qi + q q2 + q- We have that enab{qi + q) = 
enah{qi)U enah{q) = enab{q2)U enab{q) = enab{q2 + q), in fact 
enab{qi) = enab{q2) since qi ~s q2. Correspondingly, given a 
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global scheduler C, we have either (^{qi + q) = C{Q2 + q) = ^ or 
C(9i + 9) = C{Q2 + q) =-L, since there is only one component. 
Assume qi + q ^ui. We have two cases: either qi ^1, or 
Q "~^C ^1- The second case is obvious. In the first case, since 
Qi ~s Q2, we have that also q2 fi2, with fii r-^g fi2- We derive 
that fiiTZfi2- For the transitions from q2 + q we proceed in the 
analogous way. 

• Let 

= {{(.a)qi, (0)92) I qi ~s 92}- 

We show that 7^ is a safe bisimulation, which is sufficient to 
prove that, if qi ~s q2, then {a)qi ~s (0)92- First observe that 
enab{{a)qi) = enab{qi) = {1} if qi can make a transition with 
a label different from a, otherwise enab{{a)qi) = 0. The same 
holds for (a) 92- Since qi ~s q2, we derive that enab{{a)qi) = 
enab{{a)q2)- Accordingly, given a global scheduler C, we have 
that either C((o^9i) = C((a)92) = 1, or C,{{a)qi) = C{{a)q2) =-L. 
Assume {a)qi -^(^ fii. Then we must have b ^ a and fii = 
{a)fi'i, where qi /i'^. Since qi ~s q2, we have also (72 ^2' 
with fi[ ~s /X2- We derive (0)92 (")a*2; ^^"^ (o) Ai'i "7^ (a) Ai2 • 
We proceed in an analogous way for the transitions from {a)q2- 

• The case of the parallel operator in components is similar to 
the case of the parallel operator on systems (see the last item 
of this proof). 



b) Assume qi q[, . . . ,qn -^s q'n- Let 

= {C^Pi ■ qi^'^Pi ■ q'i)}^ ~s • 

i i 

We show that 7?, is a safe bisimulation, which is sufficient to prove 
that ^iPi : qi ~s J2iPi '■ q'i- Observe that both ^^Pi : qi and 
^iPi ■ q'i are enabled, and, since there is only one component, 
enab{^-Pi : qi) = enab{YjiPi ■ q'i) = {!}• Accordingly, if C is a 
global scheduler, we have enabi^^pi : qi) = enab{Y^-pi : q'^) = 1. 
Given a global scheduler ^, the only transitions from "^^Pi '■ qi and 
^.pi : q'i are J^iPi ■ qi Ei Pi ' K and ^.pi : q'^ pi ■ 5g> 

respectively, which mimic each other in the action r. It is easy 
to see that we have C^^Pi '■ qi) ~s (EjP« • and therefore 

iJ2iPi ■■ qi)'^{EiPi ■■ q'i)- 
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c) Let 

' {{AUB) qiW ...\\q\\ ...\\ Qn, 

n=\ {AUB') q[\\...\\q\\...\\q'J 

^ (B) II ... II qn {B') q[ \\ 



We show that 7^ is a safe bisimulation, which is sufficient to prove 
that, if 

(B) (?i II ... II qn ~. {B') q[\\...\\q'^, 

then 

(AUB) qiW ...\\q\\ ...\\qn ~. {AU B') q[ \\ . . . \\ q \\ . . . \\ q'^ . 
Observe first that 

enab{{A U i?) gi || . . . || g || . . . || qn) = 
enab{{A U B') g'^ || . . . || g || . . 



In) 



In fact the enabled components are the same as those of 
(-B) qi II ... II qn and of [B') q[ \\ ... || q'^ (modulo the index 
shift), which are equal by the bisimilarity hypothesis, plus possibly 
the component q, plus possibly the synchronizations with q, which 
again are equal by the bisimilarity hypothesis, minus the transitions 
with labels in A. Note that the hypothesis fn{q) ^ BU B' is essen- 
tial here to guarantee that the component q is enabled (or disabled) 
in both sides. 

Let us consider the synchronization case; the interleaving case is 
just a simplified variant. Given a global scheduler assume 

({(AUB) qi\\...\\q\\...\\qn)= CiiAUB') || . . . || g || . . . || q'J. 

Consider a move from the system in the left-hand side: 

{AU B) qi \\ ■ ■ ■ \\ qi \\ ■ ■ ■ \\ qj || • • • || ^ S(_A)qi\\-\\n\\-\\rj\\-\\qr,- 

Then we must have 

a r a r 

Qi Or^ , Qj Or^ , 

where one of the q^, qj could be q, and 

C{{A U B) qi \\ ■ ■ ■ \\ qi \\ ■ ■ ■ \\ qj \\ ■ ■ ■ \\ qn) = {i, j}- 

Since qi ~s q'^ and qj ~s q'j (in case qi = q then q'i = q and therefore 
Qi ~s Qi because ~q is reflexive, and analogously for qj), we must 
have 

/ a r- I a I- 

Qi Or' , Qj Or' , 
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for some , r'^ such that 5^ ~s 5r' ^-iid ^Tj ~s <^r' ■ We derive that 

{A U B) q[ II • • • II II • • • lUj II • • • II Qn ^ hA)q[\\-\K\\-\\rr\\-\K ' 

and, since 6^ ~s (J^'i "^r^ ~s ^r' imply ~s r'-, Vj ~3 r^-, and by the 
definition of TZ, we conclude 

(^(A)gi||-||ri||-||rj||-|kn) ^ ) ' 

We proceed in an analogous way for the transitions from the right- 
hand side. 

□ 

The following property shows that bisimulation is stronger than safe-trace 
equivalence, like in the standard case. 

Proposition 66. If qi Q2 then qi ~s ^2- 

Proof. For this proof, it is convenient to consider a coinductive approxima- 
tion of safe-trace equivalence. We start with a coinductive characterization of 
the safe traces. This in itself is not a key notion of the proof, but will help 
understanding the definition of the approximation. 

Given a TPA M = (Q,T, C, q, •&), consider the operator 

Ttt ■■ (Q ^ P(CPaths(M) ^ [0, 1])) ^ (Q ^ P(CPaths(M) ^ [0, 1])) 

defined as: 

TT.(F)(g)= {/:(rx/:)-^[0,l] | 

if g 7^ then /(e) = 1, else /(e) = and, 

for all € T, a € 

• if there exists fi s.t. q then for each q' £ Q 
there exists /^, G F{q') s.t. for every t £ {T ^ ^)°°5 
fih:a.t) = j:^,f,iq')fl,it) 

• ifq then f{q){tg -.a- 1) =0 } 

where q -f^ means that for all € T, a € i2, we have q -^7^. 

Consider the ordering C on Q — )■ 'P(CPaths(M) — )■ [0, 1]) given by 

F □ F' if and only if for all g G Q, F(g) C F\q) 

Clearly (CPaths(M) — >■ [0, 1]), E) is a complete lattice and Txr is monotonic, so 
by the theorem of Knaster-Tarski it has a greatest fixed point, which coincides 
with Traces s- 
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Following the definition of Itt, we now give a coinductive approximation of 
the equivalence relation induced by TraceSg- Given a TPA M = (Q, T, C, q, ??), 
consider the operator 

TTreg : (CPaths(M) ^ Q X Q) ^ (CPaths(M) ^ Q x Q) 

defined as: 

de f 

qi TTreg(3^)(e) 92 ^ (gi 7^ 92 A) 

and 

de/ 

?1 'J'Treq(3i)(ig : a ■ t) q2 ^ 

^ qi^ 3/i2.(g2 ^ ^2 A ^1 ;U2) ^ 

A 

V 92 ^ /W2 =^ 3/ii.(gi ^ ^fi A ^2) y 

Consider the ordering ^ on CPaths(M) ^ Q x Q given by 

31^51' if and only if for all t e CPaths(M), Jl{t) C 

Clearly (CPaths(M) — )■ Q x Q, ^) is a complete lattice and Txreg is monotonic, 
hence by the Knaster-Tarski theorem it has a greatest fixed point, which also 
coincides with the greatest pre-fixed point, i.e. the greatest relation 3? such 
that 31 ■< 7'Yreq{'^)- Using the definition of Txr it is easy to see that, if IR is 
a pre-fixed point, and qi q2 for all t S CPaths(M), then Traces s{qi) = 
Tracess{q2), i-e. qi ~s 92- In fact, if F{qi) = F{q2), and qi q2 for 

all t € CPaths(M), and 31 is a pre-fixed point of Txreg; then TTr(i*')('?i) = 
7Tr{F)iQ2)^ ■ Consider now a safe bisimulation TZ, and let us lift it to a constant 
function H : CPaths(M) Q x Q defined as Ji{t) = TZ. It is easy to see that 
3? is a pre-fixed point of Tij-eq^- 

Assume now qi TZ q2- We trivially derive that qi 3?(t) q2 for all t G 
CPaths(M), from which we conclude gi — (72- 

□ 

Like in the standard case, the vice-versa does not hold, and safe-trace 
equivalence is not a congruence^. 

■^Note that the condition is only sufficient, because /ii(?')/<j'i(^) = 'l2q' M2(g')/i3'2(^) 
may hold even if fii and jj,2 assign different probability to some equivalence class of Ji{t). 

*Note that the converse does not hold, i.e. Ji could be a pre-fixpoint of TTreg even if 
TZ is not a bisimulation. This is because 71 is sensitive to the (nondeterministic) branching 
structure, while Dl is not. 

^This is because we are considering the complete traces. 
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6.5 Safe nondeterministic information hiding 

In this section we define the notion of information hiding under the most 
general hypothesis that the nondeterminism is handled partly in a demonic 
way and partly in an angelic way. We assume that the demonic part is in 
the realm of the global scheduler, while the angelic part is controlled by the 
local scheduler. The motivation is that in a protocol the local components 
can be thought of as programs running locally in a single machine, and locally 
predictable and controllable, while the network can be subject to attacks that 
make the interactions unpredictable. 

We recall that, in a purely probabilistic setting, the absence of leakage, 
such as noninterference and strong anonymity, is expressed as follows (see for 
instance [BP]). Given a purely probabilistic automaton M, and a sequence 
d = oia2 . . . an, let PM([fl]) represent the probability measure of all complete 
paths with trace a in M. Let S he a protocol containing a variable action 
seer, and let s be secret actions. Let Ms be the automaton corresponding to 
'S'[^/secr]- Define Pr(d \ s) as Pj\,/^([o]). Then S is leakage-free if for every 
observable trace d , and for every secret si and S2, we have Pr{d \ si) = Pr{d \ 

In a purely nondeterministic setting, on the other hand, the absence of 
leakage has been characterized in the literature by the property S'['^^/secr] — 
S[^^/secr], where = is an equivalence relation like trace equivalence, or bisim- 
ulation. As we have argued in the introduction, this definition assumes an 
angelic interpretation of nondeterminism. 

We want to combine the above notions so to cope with both probability 
and nondeterminism. Furthermore, we want to extend it to the case in which 
part of the nondeterminism is interpreted demonically. Let us first introduce 
some notation. 

Let 5 be a system containing a variable action seer. Let s be a secret action. 
Let Ms be the TPA associated to S['^ /seer] and let ((, ^) be a compatible pair 
of global and local schedulers for Ms- The probability of an observable trace 
d given s is defined as Pr^^^(a | s) = PMs,c;,5([a])- 

The global nondeterminism is interpreted demonically, and therefore we 
need to ensure that the conditional of an observable, given the two secrets, 
are calculated with respect to the same global scheduler. On the other hand, 
the local scheduler is interpreted angelically, and therefore we can compare the 
conditional probabilities generated by the two secrets as sets under different 
schedulers. In other words, we have the freedom to match conditional proba- 
bility from the first set with one of the other set, without requiring the local 
scheduler to be the same. 

Either angelic or demonic, we want to avoid the clairvoyant schedulers, 
i.e. a scheduler should not be able to use the secret information to achieve its 
goals. For this purpose, we require both the global and the local scheduler to 
be admissible. 
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Definition 67. A system is leakage-free if, for every pair of secrets si and S2, 
every admissible global scheduler and every observable trace a, 

{Pr(^^^(a I si) I is admissible and compatible with (} = 
{Pr^^g(a I S2) I (, is admissible and compatible with (} 

The safe equivalences defined in Section 6.4 imply the absence of leakage: 

Theorem 68. Let S be a system with a variable action seer and assume 
S['^'^/secr] —s Sl'^^/secr] for cvcry pair of secrets si and S2- Then S is leakage- 
free. 

Proof. Consider the abstraction operator /3 from safe traces to pairs of the 
form (tagged observable trace, probability) defined as: 

(S,p)e/3(F) "4 p = 

t\TxO = 

It is easy to see that /3 is an abstraction, i.e. if Fi = F2 then /3{Fi) = 
/3(F2). Therefore, 5[^V,ecr] Si'^secr] implies P{TraceSs{S['^ / seer]) = 
/3{TraceSs{S[^^ / seer])- Finally, the latter holds (for every pair of secrets si, 
S2) if and only if S is leakage-free. 

□ 

Note that the vice versa is not true, i.e. it is not the case that the leakage- 
freedom of S implies S[^'^ /seer] —s 5' /seer]- This is because in the definition of 
safe trace equivalence we compare the set of probability functions (determined 
by the schedulers) on traces, while in the definition of leakage-freedom we 
compare the set of probabilities of each trace, which may come from different 
functions. This additional degree of freedom generated by the local scheduler 
helps the system to obfuscate the secret, and provides further justification for 
the adjective "angelic" for the local nondeterminism. 

From the above theorem and from Proposition 66, we also have the follow- 
ing corollary (with the same premises as the previous theorem): 

Corollary 69. If S['^ / 

seer] ~s / seer] for cvcry pair of secrets si and S2, 

then S is leakage-free. 

6.6 Related work 

The problem of deriving correct implementations from secrecy specifications 
has received a lot of attention already. One of the first works to address the 
problem was [Jac89], which showed that the fact that an implementation 
is a consistent refinement with respect to a specification does not imply that 
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the (information-flow) security properties are preserved. More recently, [AZ06] 
has proposed a notion of secrecy-preserving refinement, and a simulation-based 
technique for proving that a system is the refinement of another. [CS08] argues 
that important classes of security policies such as noninterference and average 
response time cannot be expressed by traditional notion of properties, which 
consist of sets of traces, and proposes to use hyperproperties (sets of properties) 
instead. [DDMIO] addresses the problem of supervisory control, i.e. given a 
critical system G that may leak confidential information, how to design a 
controller C so that the system G\C dos not leak. An effective algorithm is 
presented to compute the most permissible controller such that the system is 
still opaque with respect to a secret. 

Concerning angelic and demonic nondeterminism, there are various works 
which investigate their relation and possible combination. In [BvW92] it is 
shown that angelic and demonic nondeterminism are dual. [MCR07] uses 
multi-relations to express specifications involving both angelic and demonic 
nondeterminism. There are two kinds of agents, demonic and angelic ones, and 
there is the point of view of the internal system and the one of the external 
adversary. 

[Mor09] considers the problem of refining specifications while preserving 
ignorance. While the focus is on the reduction of demonic nondeterminism of 
the specification, the hidden values are treated essentially in a angelic way. 

The problem of the leakage caused by full-information schedulers has also 
been investigated in the literature. [CCK'^OGa] and [CCK'^OGb] work in the 
framework of probabilistic automata and introduce a restriction on the sched- 
uler to the purpose of making them suitable to applications in security pro- 
tocols. Their approach is based on dividing the actions of each component of 
the system in equivalence classes {tasks). The order of execution of different 
tasks is decided in advance by a so-called task scheduler, which is history- 
independent and therefore much more restricted than our notion of global 
scheduler. [APvRS] proposes a notion of system and admissible scheduler very 
similar to our notion of system and admissible global scheduler. The main 
difference is that in that work the components are deterministic and therefore 
there is no notion of local scheduler. 

The work in [CP, CNP09] is similar to ours in spirit, but in a sense dual 
from a technical point of view. Instead of defining a restriction on the class 
of schedulers, the authors a way to specify that a choice is transparent to the 
scheduler. They achieve this by introducing labels in process terms, used to 
represent both the states of the execution tree and the next action or step 
to be scheduled. They make two states indistinguishable to schedulers, and 
hence the choice between them private, by associating to them the same label. 
We believe that every scheduler in our formalism can be expressed in theirs, 
too. In [CNP09] the authors consider the problem of defining a safe version of 
bisimulation for expressing security properties. They call it demonic bisimu- 
lation. The main difference with our work is that we consider a combination 
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of angelic and demonic nondeterminism, and this affects also the definition of 
bisimulation. Similarly, our definition of leakage-freedom reflects this combi- 
nation. In [CNP09] the aspect of angelicity is not considered, although they 
may be able to simulate it with an appropriate labeling. 

The fact that full-information schedulers are unrealistic has also been ob- 
served in fields other than security. First attempts used restricted schedulers 
in order to obtain rules for compositional reasoning [dAHJOl]. The justifica- 
tion for those restricted schedulers is the same as for ours, namely, that not 
all information is available to all entities in the system. That work considers 
a synchronous parallel composition, however, so the setting is rather different 
from ours. Later on, it was shown that model checking is unfeasible in its 
general form for the restricted schedulers in [dAHJOl] (see [GD07] and, more 
recently, [Gir09]). Despite of undecidability, not all results concerning such 
schedulers have been negative as, for instance, the technique of partial-order 
reduction can be improved by assuming that schedulers can only use partial 
information [GDF09]. 

6.7 Chapter summary and discussion 

In this chapter we have observed that some definitions of security properties 
based on process equivalences may be too naive, in the sense that they assume 
the scheduler to be angelic, and, worse yet, to achieve its angelic strategy by 
peeking at the secrets. We have presented a formalism allowing us to specify a 
demonic constituent of the scheduler, possibly in collusion with the attacker, 
and an angelic one, under the control of the system. We have also considered 
restrictions on the schedulers to limit the power of what they can see, and 
extended to our nondeterministic framework the (probabilistic) information- 
hiding properties like non interference and strong anonymity. We then have 
defined "safe" equivalences. In particular we have defined the notions of safe 
trace equivalence and safe bisimilarity, and we have shown that the latter is 
still a congruence. Finally, we have shown that the safe equivalences can be 
used to prove information-hiding properties. 

For the future, we plan to extend our framework to quantitative notions 
of information leakage, possibly based on information theory. We also plan to 
implement model checking techniques to verify information hiding properties 
for our kind of systems. 
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Conclusion 



"To succeed, jump as quickly at opportunities as you do at conclusions." 

Benjamin Franklin 

In this thesis we concentrated on the problem of information hiding in the sce- 
narios of interactive systems, statistical disclosure control, and the refinement 
of specifications. We started by giving a general overview of the field of infor- 
mation hiding, including a brief description of its historical development. We 
then discussed the main differences between the qualitative and the quantita- 
tive approaches to information hiding, and we introduced the background for 
the three main topics covered in this thesis: information flow (exemplified by 
anonymity), statistical disclosure control, and the refinement of specifications 
into implementations. 

Having adopted the quantitative approach, we then continued to discuss 
the rationale of the use of information theory for quantitative information flow. 
We reviewed several formulations of entropy, with a special focus on Shannon 
entropy and min-entropy, and the related concept of mutual information and 
its interpretation in terms of attacks and information leakage. 

We then proceeded to present the technical contributions of the thesis. We 
started with the scenario of interactive systems, i.e systems where secrets and 
observables can alternate and influence each other during the computation. 
In this type of systems the traditional information theoretical approach that 
makes use of classic memoryless channels, and the related concepts of mutual 
information and classical capacity, no longer works. We proposed to model 
interactive systems with a richer notion of channels, namely channels with 
memory and feedback. In this more general model it is possible to split the 
statistical correlation between secrets and observables (that correspond to the 
input and the output of the channel, respectively) into two causal components: 
the directed information from input to output represents the flow of informa- 
tion through the channel, and the directed information from output to input 
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corresponds to the way the input is influenced by the output via feedback. 
We showed that the directed information is the correct measure of leakage in 
interactive systems, and so is the concept of directed capacity if we are inter- 
ested in the worst case leakage. We also proved that our model is a proper 
extension of the classic one: in the absence of feedback (i.e interaction) our 
model collapses into the simpler classic model. Finally, we showed that the 
capacity of channels with memory and feedback is a continuous function of a 
pseudometric based on the Kantorovich metric. 

With respect to interactive systems, as future work we want to explore 
algorithms to calculate the leakage and the maximum leakage using our model. 
This is a rather challenging problem, given the exponential growth of reaction 
functions (a technical aspect of our model) and the quantification of possibly 
infinite many reactors (also another technicality of our model). We also want 
to explore other notions of entropy as a measure of leakage, as for instance the 
min-entropy and the corresponding notion of one-try attack. 

In the sequence we moved to the problem of statistical disclosure control. 
We considered the problem of preserving the privacy of individuals participat- 
ing in a database that allows statistical queries to be posed by users. Using 
differential privacy, databases that are similar, i.e differ by the contents of at 
most one row, should give statistically "similar" answers to the same query. 
This is achieved by introducing noise in the query mechanism to blur the link 
between the reported answer and the data about individuals. We proposed 
a model where the differential privacy mechanism can be split into two chan- 
nels in cascade, in the case the randomization mechanism is oblivious (i.e it 
only depends on the real answer to the query, and not on the database it- 
self). The first channel corresponds to the query, and it maps the database 
to the real answer to the query. The second channel corresponds to the obliv- 
ious randomization mechanism, and it takes the real answer and maps it to 
a randomized answer to be reported to the user. In this scenario we see the 
leakage as the correlation between the reported answer and the database, and 
the utility as the correlation between the real answer and the reported one. 
We used this model to derive bounds for the leakage and utility based on the 
level of differential privacy designed for the system (namely the parameter e). 
As a measure of leakage we adopted the min-entropy leakage, and for utility 
we used the notion of gain functions, focusing on the binary gain function, 
which is strictly related to min-entropy leakage and Bayes risk. We used the 
graph structure on the input domain derived from the adjacency relation on 
databases to derive bounds for the maximum min-entropy leakage of channels. 
We showed that if the graph structure is distance-regular or VT~^ (which is 
always the case for the database domain), then we can derive bounds for the 
maximum min-entropy leakage associated to the channel. Finally, we found a 
way of constructing a utility-maximizing randomization function that respects 
differential privacy for a special class of graph structures. 

In relation to statistical databases, as future work we intend to extend our 
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results to other types of gain functions than the binary one, namely gain func- 
tions that take into consideration a notion of distance between answers. We 
also want to investigate whether or not non-oblivious randomization mecha- 
nisms can be used to improve utility while still preserving differential privacy. 

The last scenario we investigated in the thesis was the use of equivalence 
relations to specify security guarantees, which is a common approach when 
refining implementations into specifications. Under this perspective, two sys- 
tems (e.g a specification and its implementation) are considered equivalently 
secure if they respect some equivalence relation defined to capture the intended 
security guarantee. Such equivalences include, for instance, trace-equivalence 
and bisimilarity. We showed that a naive use of these equivalences can lead to 
unrealistic assumptions about the scheduler: (i) that the scheduler is angelic, 
i.e that it will help to keep the secret information from the attacker; and (ii) 
that the scheduler can peek at the secrets to make its choices. Those assump- 
tions are not safe in practical cases and, therefore, we proposed a model that 
deals with the problem. We introduced a formalism that explicitly separates 
the demonic and angelic parts of the scheduler, and we imposed restrictions 
to limit the power of the scheduler with respect to what it can see. Namely, 
the scheduler cannot peek at the secrets to make its choices. We then de- 
fined notions of safe-equivalences (safe trace equivalence and safe bisimilarity) 
and we showed that the latter is a congruence. Finally, we showed that safe 
equivalences can be used to prove information hiding properties. 

As future work regarding safe equivalences, we want to extend our model 
to quantitative notions based on information theory, and we want to use model 
checking to certify information hiding properties for our systems. 

As final remark, we believe that information hiding is a very promising 
field of research, and we are excited and thrilled by the promising challenges 
that lie ahead. 
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